Methods and compositions for creating altered and improved cells and organisms

ABSTRACT

The present invention relates to compositions involving randomized in-frame fusion polynucleotides and their introduction into a host organism to identify desirable phenotypic changes. The present invention further relates to methods of generating these randomized in-frame fusion polynucleotides by introducing randomized in-frame fusion polynucleotides into an organism, selecting for organisms with new or altered phenotypes, re-isolating the randomized in-frame fusion polynucleotides from the selected organisms, re-assembling the constituent polynucleotides of the re-isolated randomized in-frame fusion polynucleotides into new collections of randomized in-frame fusion polynucleotides, and repeating the selection for organisms with new or altered phenotypes.

BACKGROUND

Numerous agricultural and industrial production systems and processes depend on specific organisms, such as plants, algae, bacteria, fungi, yeasts, protozoa and cultured animal cells, for production of useful materials and compounds, such as food, fiber, structural materials, fuel, chemicals, pharmaceuticals, or feedstocks. In the process of the current shift to biological production systems for a variety of chemicals and fuels, a wide assortment of organisms will be used for production, most of them microbes, with an increasing tendency towards photosynthetic organisms (Dismukes 2008). The ability to grow robustly, and the ability to efficiently produce the materials and compounds of interest, are desirable properties of these organisms.

Optimization of the growth of these organisms and augmentation of their yield of useful materials and compounds is an ongoing activity of many companies and individuals, with the goal of achieving a higher productivity or yield, or lower production cost of commercially important materials and compounds. Such improvements can occur through the modification of production systems, or through the modification of the organisms themselves.

Polynucleotide fusions, involving joining of intact or partial open reading frames encoded by separate polynucleotides, is a known way of altering a polynucleotide sequence to change the properties of the encoded RNA or protein and to alter the phenotype of an organism. There are two general mechanisms by which polynucleotide fusions can alter an organism's phenotype. These two mechanisms can be illustrated with the case of polynucleotide A (encoding protein A′) fused to polynucleotide B (encoding protein B′), in which proteins A′ and B′ have different functions or activities and/or are localized to different parts of the cell. The first mechanism applies to sub-cellular localization of the two proteins. The fusion protein encoded by the polynucleotide fusion of the two polynucleotides may be localized to the part of the cell where protein A′ normally resides, or to the part of the cell where protein B′ normally resides, or to both. This alteration of cellular distribution of the activities encoded by proteins A′ and B′ may cause a phenotypic change in the organism.

The second general mechanism by which fusion proteins alter the phenotypic property of a cell or organism relates to the direct association of two different, normally separate functions or activities in the same protein. In the case of proteins A′ and B′, their fusion may lead to an altered activity of protein A′ or of protein B′ or of the multiprotein complex in which these proteins normally reside, or of combinations thereof The altered activity includes but is not limited to: qualitative alterations in activity; altered levels of activity; altered specificities of activity; altered regulation of the activity by the cell; altered association of the protein with other proteins, DNA or RNA molecules in the cell, leading to changes in the cell's biochemical or genetic pathways. As a result, a system for creating artificial polynucleotide fusions has the potential to create many phenotypes that are rarely or never found in nature.

To date, no attempt has been made to take advantage of the function-generating capability of fusion polypeptides in a large-scale and systematic manner. There are no published examples of large-scale collections of randomized, in-frame polynucleotide fusions. Previous examples of fusion proteins have been generated in a limited and directed fashion with specific outcomes in mind. The present invention describes the creation and use of systematic, randomized, large-scale and in-frame polynucleotide fusions for the purpose of altering protein function, generating new protein functions, and/or generating novel phenotypes of interest in biological organisms.

The present invention also describes methods by which large-scale collections of randomized, in-frame fusion polynucleotides can be selected in an iterative fashion to arrive at smaller collections of in-frame fusion polynucleotides enriched for a particular function or ability to confer a phenotype of interest to an organism.

BRIEF SUMMARY OF THE INVENTION

The present invention relates to methods and compositions that bring about changes in phenotypes in an organism through the introduction of randomized in-frame fusion polynucleotides into the genome of the organism. The random association of multiple sequences results in randomized in-frame fusion polynucleotides that disrupt or alter existing genetic or biochemical mechanisms or pathways in the cell or organism, thus creating novel characteristics of the transformed cell or organism. This method is useful for increasing diversity within populations of organisms, and creating new and useful phenotypes or characteristics in such organisms.

The present invention uses randomized in-frame fusion polynucleotides to create previously unknown phenotypes, or enhance existing phenotypes, in a target cell or organism. The present invention is directed to a composition comprising at least 2 discrete random polynucleotides randomly fused in-frame to form at least one randomized in-frame fusion polynucleotide. The randomized in-frame fusion polynucleotide can be operably linked to at least one regulatory sequence that controls expression of the randomized in-frame fusion polynucleotide where the regulatory sequence is a promoter, a terminator, or an untranslated sequence. In one embodiment, the randomized in-frame fusion polynucleotide is operably linked to a vector. The randomized in-frame fusion polynucleotide can be introduced into a host cell. In some cases the host cell can be regenerated into the organism from which the host cell was derived. The randomized fusion polypeptide causes a phenotype that is not present in a control cell or a control organism.

The invention is also directed to large scale methods of producing randomized in-frame fused polynucleotides by isolating polynucleotides from an organism and randomly joining the fragments in-frame. Another embodiment presents a method of altering the phenotype of a cell comprising introducing into a host cell the randomized in-frame fusion polynucleotide. Yet another embodiment presents a method for altering the phenotype of an organism by introducing a randomized in-frame fusion polypeptide into a host cell and then regenerating the organism from that cell. Yet another embodiment presents a method for identifying a randomized in-frame fusion polypeptide responsible for an altered phenotype by comparing the life cycle of the cell or organism containing the randomized in-frame fusion polypeptide to a control cell or organism, selecting the cell or organism containing the randomized in-frame fusion polypeptide that displays a phenotype absent in the control organism, isolating the randomized in-frame fusion polynucleotide encoding the randomized in-frame fusion polypeptide from the selected organism, introducing the isolated randomized in-frame fusion polynucleotide into another host cell and, if appropriate regenerating the organism from that host cell, and then comparing the randomized in-frame fusion polynucleotide containing cell or regenerated organism to a control organism to confirm that the observed altered phenotype remains.

In some embodiments, a collection of coding sequences (open reading frames or ORFs) is generated, and random pairs of ORFs are cloned into an expression vector as randomized translational fusions. This is done in a manner that each ORF present in the starting collection can be positioned in a 5′ orientation with respect to the ORF it is fused to, or in a 3′ orientation. The resulting library of randomized in-frame fusion polynucleotides is introduced into a target organism, and transformed cells or organisms are selected for the presence of the randomized in-frame fusion polynucleotide. In another embodiment, populations of transformed organisms are selected or screened for a novel phenotype. Transformed organisms with the desirable phenotype are of direct utility in a process that the target organism is typically used for.

The large-scale collections of randomized, in-frame fusion polynucleotides described can also be selected in an iterative fashion to arrive at smaller collections of in-frame fusion polynucleotides enriched for a particular function or ability to confer a phenotype of interest to an organism. Such enrichments can be performed in a manner that the in-frame fusion polynucleotides isolated at the end of each round of selection are kept intact. Alternatively, the enrichment is performed in a manner where the component sequences represented within the in-frame fusion polynucleotides isolated at the end of each round of selection are recombined with each other to arrive at potentially new combinations of sequences. This method may enrich for sequence combinations that may not have been represented at high levels in the starting collection. Such iterative procedures of introducing collections of randomized, in-frame fusion polynucleotides into an organism, performing a functional selection on the organism, re-isolating the in-frame fusion polynucleotides from the population of organisms present at the end of the selection, optionally recombining the polynucleotides present in this re-isolated population of fusion polynucleotides, and then repeating the procedure, is a very powerful way of obtaining polynucleotides capable of conferring specific phenotypes of interest.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Example of randomizing a collection of ORFs into randomized in-frame fusion polynucleotides and using these to alter an organism phenotypically. A collection of ORFs (A) is combined with a vector DNA molecule (B) in a manner where ORFs are combined in a randomized pairwise fashion, resulting in a large collection of randomized fused ORFs (C). The vector molecule in this example contains sequences mediating expression of the ORFs (double lines). The collection of randomized in-frame fusion polynucleotides is introduced into an organism (D), and transformants are isolated (E), some of which have altered phenotypes. Modified organisms with phenotypes of interest (F) are isolated from this population.

FIG. 2: Example of assembling two ORFs into an expression vector in a single step by homology-dependent cloning. (A) A 5′ ORF and a 3′ ORF are PCR amplified using sequence-specific primers (P1, P2, P3, P4). Each primer contains extra sequences at its 5′ end that specifies homology to sequences in the other ORF or in the vector, corresponding to the order in which the fragments are to be assembled (see B). (B) The PCR-amplified ORFs containing the sequences homologous to each other and to the cloning vector. (C) The cloning vector is prepared to receive the PCR-amplified ORFs. (D) The PCR-amplified ORFs are combined with the cloning vector and assembled into a final construct by allowing the regions of homology between the three fragments to direct each fragment into the correct position and orientation. For simplicity, the Figure shows a single 5′ ORF and a single 3′ ORF, but the same method will work with mixtures containing any number of ORFs.

FIG. 3: Schematic representation of iterative selection of fusion genes to arrive at genes with high activity. A collection of 5′ ORFs (A) and 3′ ORFs (B) are combined with an expression vector molecule (C) in a manner that ORFs are combined in a randomized pairwise fashion, resulting in a large randomized collection of paired ORFs fused in-frame (D). The vector molecule in this example contains sequences mediating expression of the ORFs (double lines). The collection of randomized in-frame fusion polynucleotides is introduced into an organism (E), and transformants are isolated, some of which have altered phenotypes. Modified organisms with phenotypes of interest are isolated from this population (F). The randomized fusion polynucleotides expressed in transformants with altered phenotypes are re-isolated (G), re-transformed into the original cell population (H) and selected for a desirable phenotype (I), resulting in a smaller collection of in-frame fusion polynucleotides conferring the desirable phenotype (K). The steps of re-isolation (G), re-transformation (H) and re-selection (I) can be repeated one or more additional times if necessary (J). Multiple phenotypes can be selected for in the course of this iterative procedure. At the end of this iterative selection procedure, individual active fusion polynucleotides are obtained (K) that reproducibly confer the phenotype of interest.

FIG. 4: Schematic representation of iterative selection of in-frame fusion polynucleotides accompanied by recombination of selected in-frame fusion polynucleotides to arrive at proteins conferring a phenotype of interest. A collection of 5′ ORFs (A) and 3′ ORFs (B) are combined with an expression vector molecule (C) in a manner where the ORFs are combined in a randomized pairwise fashion, resulting in a large randomized collection of paired ORFs fused in-frame (D). The vector molecule in this example contains sequences mediating expression of the ORFs (double lines). The collection of randomized in-frame fusion polynucleotides (D) is introduced into an organism (E), and transformants are isolated, some of which have altered phenotypes. Modified organisms with phenotypes of interest (F) are isolated from this population and the randomized in-frame fusion polynucleotides are purified (G) from the selected transformants. The 5′ ORFs (H) and 3′ ORFs (I) contained therein are re-isolated. These selected ORF sets are assembled together with an expression vector molecule to result in a new and smaller collection of randomized in-frame fusion polynucleotides (J). The new collection is re-transformed into the organism (K), transformants are selected for a desirable phenotype (L), and the randomized in-frame fusion polynucleotides (M) are isolated from the selected transformants, resulting in a smaller collection of randomized in-frame fusion polynucleotides conferring the desirable phenotype. The steps of re-isolation (G, H, I), re-assembly of a randomized fusion polynucleotide library (J), re-transformation (K), re-selection (L) and subsequent isolation of the resulting randomized in-frame fusion polynucleotide (M) can be repeated one or more additional times if desired (N). Multiple phenotypes can be selected for during the course of this iterative procedure. At the end of the iterative selection procedure, individual randomized in-frame fusion polynucleotides are obtained (M) that reproducibly confer the phenotype of interest.

FIG. 5: Saccharomyces cerevisiae culture plates showing the results of cell survival assays performed on individual, cloned in-frame fusion polynucleotides to test for heat, salt, ethanol, butanol and low pH tolerance of yeast cells transformed with randomized in-frame fusion polynucleotides. Yeast strain BY4741 transformed with 16 different in-frame fusion polynucleotides and the p416-GAL1 control plasmid were cultured in triplicate cultures at high temperature or in the presence of selective agents. After growth under selective conditions, a portion of each culture was diluted 1:10 in fresh medium, and 3 μl of the diluted and undiluted culture were spotted onto fresh medium, and allowed to grow for 2 days. The selective agents were (A), ethanol; (B), butanol; (C), heat; (D), salt; (E), low pH. A map of the plate identifying all clones tested and their relative position in the plate is shown (F). The data shown in this figure is a subset of the data represented in Table 1.

FIG. 6: Escherichia coli culture plates showing the results of cell survival assays performed on individual, cloned in-frame fusion polynucleotides to test for salt and heat tolerance of cells transformed with each clone. E. coli strain EC100 (Epicentre Technologies), transformed with 47 different in-frame fusion polynucleotides and the modified pUC19 control plasmid was cultured in the presence of 2.5 M NaCl (A) or grown at 48° C. (B). After growth under selective conditions, each culture was diluted 1:10 in fresh medium, and 3 μl of the diluted and undiluted culture were spotted onto fresh medium and allowed to grow for 16 hours. A map of the plate identifying all clones tested and their relative position in the plate is shown (F). The data shown in this figure is a subset of the data represented in Table 3.

DETAILED DESCRIPTION OF THE INVENTION Definitions

Composite open reading frame: As used herein, a composite open reading frame results from the in-frame fusion of at least two different starting open reading frames, resulting in a new open reading frame comprising all starting open reading frames and encoding a fusion protein comprising the sequences encoded by all starting open reading frames.

Degenerate Sequence: In this application degenerate sequences are defined as populations of sequences where specific sequence positions differ between different molecules or clones in the population. The sequence differences may be a single nucleotide or multiple nucleotides of any number, examples being 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900 or 1000 nucleotides, or any number in between. Sequence differences in a degenerate sequence may involve the presence of 2, 3 or 4 different nucleotides in that position within the population of sequences, molecules or clones. Examples of degenerate nucleotides in a specific position of a sequence are: A or C; A or G; A or T; C or G; C or T; G or T; A, C or G; A, C or T; A, G or T; C, G or T; A, C, G or T.

Discrete Random Polynucleotide: A discrete random polynucleotide refers to a specific polynucleotide within a mixed collection of polynucleotides, chosen randomly from the collection.

Full-length Open Reading Frame: As used herein, a full-length open reading frame refers to an open reading frame encoding a full-length protein which extends from its natural initiation codon to its natural final amino-acid coding codon, as expressed in a cell or organism. In cases where a particular open reading frame sequence gives rise to multiple distinct full-length proteins expressed within a cell or an organism, each open reading frame within this sequence, encoding one of the multiple distinct proteins, can be considered full-length. A full-length open reading frame can be continuous or may be interrupted by introns.

Fusion polynucleotide: A fusion polynucleotide as used in this application refers to a polynucleotide that results from the operable joining of two separate and distinct polynucleotides into a single polynucleotide. In the context of this application, the term “in-frame fusion polynucleotide” is defined as a fusion polynucleotide encoding a fusion polypeptide.

Fusion polypeptide: A fusion polypeptide is an expression product resulting from the fusion of two or more open reading frames that originally coded for separate proteins.

In-Frame: The term “in-frame” in this application, and particularly in the phrase “in-frame fusion polynucleotide,” refers to the reading frame of codons in an upstream or 5′ polynucleotide or ORF as being the same as the reading frame of codons in a polynucleotide or ORF placed downstream or 3′ of the upstream polynucleotide or ORF that is fused with the upstream or 5′ polynucleotide or ORF. Such in-frame fusion polynucleotides typically encode a fusion protein or fusion peptide encoded by both the 5′ polynucleotide and the 3′ polynucleotide. Collections of such in-frame fusion polynucleotides can vary in the percentage of fusion polynucleotides that contain upstream and downstream polynucleotides that are in-frame with respect to one another. The percentage in the total collection is at least 10% and can number 10%, 11%, 12%, 13%, 14%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% or any number in between.

Iterate/Iterative: In this application, to iterate means to apply a method or procedure repeatedly to a material or sample. Typically, the processed, altered or modified material or sample produced from each round of processing, alteration or modification is then used as the starting material for the next round of processing, alteration or modification. Iterative selection refers to a selection process that iterates or repeats the selection two or more times, using the survivors of one round of selection as starting material for the subsequent rounds.

Non-homologous: The term “non-homologous” in this application is defined as having sequence identity at the nucleotide level of less than 50%.

Open Reading Frame (ORF): An ORF is defined as any sequence of nucleotides in a nucleic acid that encodes a protein or peptide as a string of codons in a specific reading frame. Within this specific reading frame, an ORF can contain any codon specifying an amino acid, but does not contain a stop codon. The ORFs in the starting collection need not start or end with any particular amino acid. The ORF may be continuous or may be interrupted by introns.

Percentage of sequence identity: The term “percent sequence identity” refers to the degree of identity between any given query sequence, e.g. SEQ ID NO:102, and a subject sequence. A subject sequence typically has a length that is from about 80 percent to 200 percent of the length of the query sequence, e.g., 80, 82, 85, 87, 89, 90, 93, 95, 97, 99, 100, 105, 110, 115, or 120, 130, 140, 150, 160, 170, 180, 190 or 200 percent of the length of the query sequence. A percent identity for any subject nucleic acid or polypeptide relative to a query nucleic acid or polypeptide can be determined as follows. A query sequence (e.g. a nucleic acid or amino acid sequence) is aligned to one or more subject nucleic acid or amino acid sequences using the computer program ClustalW (version 1.83, default parameters), which allows alignments of nucleic acid or protein sequences to be carried out across their entire length (global alignment, Chenna 2003).

ClustalW calculates the best match between a query and one or more subject sequences, and aligns them so that identities, similarities and differences can be determined. Gaps of one or more residues can be inserted into a query sequence, a subject sequence, or both, to maximize sequence alignments. For fast pairwise alignment of nucleic acid sequences, the following default parameters are used: word size: 2; window size: 4; scoring method: percentage; number of top diagonals: 4; and gap penalty: 5. For multiple alignment of nucleic acid sequences, the following parameters are used: gap opening penalty: 10.0; gap extension penalty: 5.0; and weight transitions: yes. For fast pairwise alignment of protein sequences, the following parameters are used: word size: 1; window size: 5; scoring method: percentage; number of top diagonals: 5; gap penalty: 3. For multiple alignment of protein sequences, the following parameters are used: weight matrix: blosum; gap opening penalty: 10.0; gap extension penalty: 0.05; hydrophilic gaps: on; hydrophilic residues: Gly, Pro, Ser, Asn, Asp, Gln, Glu, Arg, and Lys; residue-specific gap penalties: on. The ClustalW output is a sequence alignment that reflects the relationship between sequences. ClustalW can be run, for example, at the Baylor College of Medicine Search Launcher website and at the European Bioinformatics Institute website on the World Wide Web (ebi.ac.uk/clustalw).

To determine a percent identity of a subject or nucleic acid or amino acid sequence to a query sequence, the sequences are aligned using Clustal W, the number of identical matches in the alignment is divided by the query length, and the result is multiplied by 100. It is noted that the percent identity value can be rounded to the nearest tenth. For example, 78.11, 78.12, 78.13, and 78.14 are rounded down to 78.1, while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to 78.2.

Phenotypic Value: Phenotypic value refers to a quantitative measure of a phenotype or a trait exhibited by an organism. For example, height measured in feet is a phenotypic value corresponding to body height in humans.

Random/Randomized: made or chosen without method or conscious decision.

Randomized In-frame Fusion Polynucleotides: As used herein, this phrase refers to polynucleotides in one or more starting populations fused to each other in a random manner to form randomized fusion polynucleotides, each randomized fusion polynucleotide comprising two or more members of the starting population(s). The random nature of the fusion is such that the association between different polynucleotides capable of fusing is not deliberately biased or directed, so that each starting polynucleotide has an equal or similar probability to be represented in the final population of fusion polynucleotides, and that it has an equal or similar probability to be fused with any other member of the starting population(s).

Randomized Translational Fusion: A randomized translational fusion is a process by which polynucleotides are randomly fused in a manner that the ORFs specified by the individual polynucleotide sequences are fused in-frame, to result in a fusion polynucleotide that encodes a fusion protein.

Randomly Fused: The term “randomly fused” refers to a process by which a collection of fused polynucleotides is generated from one or more collections of starting polynucleotides, where each member of the starting polynucleotide collection(s) has an equal or similar probability of joining to each other member. The objective of generating randomly fused polynucleotides is typically to generate all possible combinations, or as many combinations as possible, of fused members or sequences.

Stringency of selection: The term “stringency of selection” refers to selection intensity, or the degree to which selective conditions affect the probability of an organism surviving the selection. A higher stringency of selection implies a higher selection intensity, with lower survival rates expected; a lower stringency of selection implies a lower selection intensity, with higher survival rates expected. Survival of a particular organism or population of organisms under selection ultimately depends on the fitness or viability of that organism or population of organisms under the selective conditions.

Transformed: The term “transformed” means genetic modification by introduction of a polynucleotide sequence.

Transformed Organism: A transformed organism is an organism that has been genetically altered by introduction of a polynucleotide sequence into the organism's genome.

One embodiment of the present invention is directed to a method for screening and sampling a large number of biochemical, genetic and interactive functions for a desired phenotype. Another embodiment of the present invention discloses a novel method of producing altered or improved cells or organisms by creating randomized in-frame fusions of open reading frames (ORFs), or fragments thereof, to create large libraries of polynucleotide combinations, which generate novel phenotypes and characteristics in organisms. Yet another embodiment of the present invention is directed to methods to generate collections of randomized in-frame fusion polynucleotides.

A collection of ORFs is generated as separate DNA fragments, or separate sequences of larger DNA fragments. A library of randomized in-frame fusion polynucleotides is then generated from one or more collections or pools of polynucleotides containing ORFs by combining two or more random polynucleotides, or fragments thereof, in a manner such that the combined polynucleotides can be expressed in the target cell as a randomized in-frame fusion peptide or polypeptide. The library of randomized in-frame fusion polynucleotides is generated in a fashion that allows many or all of the possible sequence combinations to be formed. The library is then introduced into an organism and allowed to express. The resulting collection of organisms expressing the randomized in-frame fusion polynucleotides is screened and/or selected for desirable phenotypes or characteristics. The polynucleotides responsible for the changes in the properties of a specific transformant can be recovered and used repeatedly.

The general concept of this approach is illustrated in FIG. 1. As an example, all polynucleotides encoded by an organism can be used in the construction of the randomized in-frame fusion polynucleotide library. In the case of the laboratory bacterium E. coli, for example, every one of the 5,286 proteins encoded by E. coli can be present in the initial collection of ORFs used to make the randomized in-frame fusion polynucleotide library. The randomized in-frame fusion polynucleotide library thus contains a very high number of polynucleotide combinations (5,286×5,286=2.8×10⁷ total combinations), and the presence of novel functions within this combinatorial set of polynucleotides is consequently high.

The polynucleotides used to make up the initial set of ORFs, or fragments thereof, can be from any source (genome, metagenome, cDNA, etc.) and can be any subset of polynucleotides from such a source, selected by sequence composition, function or other criteria. The method can thus be tailored to capture specific biochemical functions, or functions from specific source organisms or source environments.

The polynucleotides used to make up the initial set of ORFs will contain sequences that are primarily non-homologous and distinct from one another, as opposed to ORFs that share extensive sequence homology.

The ORFs in the starting collection can number at least 5 or higher, including at least 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 10000, 20000, 30000, 40000, 50000, 100000, 200000, 300000, 400000, 500000, 1000000 or higher. The number of randomized in-frame fusion polynucleotides in the library typically equals at least the number of ORFs in the starting collection and can be as many as the square of the number of ORFs in the starting collection, which would be the expected number of all possible polynucleotide combinations assuming that each ORF is present in both possible positions (5′ and 3′) and in combination with each other ORF. The number of randomized in-frame fusion polynucleotides in a library generated from fragments of ORFs would be expected to have an even greater number of combinations.

The ORFs in the starting collection can be derived from a single organism or from multiple organisms. Potential sources of the ORFs include, but are not limited to, random pieces of genomic DNA or amplified genomic DNA from any virus, bacterium, archaeon, prokaryote, eukaryote, protozoan, yeast, fungus, animal, alga or plant or mixed population thereof; bacterial ORFs present as complete or partial collections or pools of protein-coding sequences derived from the genomes of one or more bacteria, archaea or other prokaryote; collections of cDNAs present as individual clones or pools of protein-coding sequences from bacteria, archaea, any prokaryote or any eukaryotic organism; randomized or partially randomized oligonucleotides; partially or fully random DNA sequences derived from randomized oligonucleotides by amplification.

The ORFs in the starting collection can comprise the entire collection of ORFs from an organism's genome, or a fraction thereof. The ORFs in a collection or pool can be pre-selected based on known function, sequence composition, sequence content, sequence homology, amino acid composition of the encoded proteins, sequence homology of the encoded proteins, length, presence of specific motifs, charge, hydrophobicity, isoelectric point, 3-dimensional structure or fold, ability to associate with other proteins, or any other property.

The ORFs in the starting collection can contain natural sequences or mutagenized sequences, including known variants of certain polynucleotides known to have a gain or loss of function, or an altered function. They can also contain degenerate sequences or sequences altered by mutagenesis. Multiple, degenerate nucleotides may be adjacent or separated by constant or fixed sequences that are not degenerate. The ORFs in the starting collection can be free of introns, such as the ORFs typically found in prokaryotes, or they may contain introns as are typically found in the ORFs of eukaryotes.

The ORFs in the starting collection can be derived from PCR fragments, PCR fragment pools, cDNAs, random pieces of genomic DNA, synthetic DNA, cloned DNA, DNA isolated directly from source organisms or from the environment, or from any other source, or any combination of sources.

The ORFs in a starting collection can be added in molar amounts corresponding to the concentrations of other ORFs, or in lower or higher amounts that change their representation within the final randomized in-frame fusion polynucleotide library. For example, if a polynucleotide coding for a specific protein conferring a desirable phenotype is suspected to have a particularly high chance of conferring that phenotype in a target organism, it is possible to over-represent this sequence in the ORF collection to ensure that most or all in-frame fusion polynucleotide combinations are tested in combination with the prioritized sequence.

The randomized in-frame fusion polypeptides can be designed in a manner that the ORFs are fused directly to each other, without any sequence inserted between the final codon of the upstream 5′ORF and the first codon of the downstream 3′ORF (or the other way around). Alternatively, the randomized in-frame fusion polypeptides are designed to have sequence insertions that encode additional amino acids between the two ORFs. These sequence insertions can range between 1 and 1000 codons in length, and encode “linker” peptide or polypeptide sequences that are suitable for separating the two parts of the randomized in-frame fusion polynucleotide. Small amino acids, such as glycine, alanine, serine, proline, threonine, aspartic acid or asparagine are suitable for linker peptides because they tend to form flexible and unstructured domains, or alpha-helical domains lacking bulky side groups. This allows separation between the two parts of the encoded randomized fusion polypeptide and allows each part of the encoded randomized fusion polypeptide to move independently relative to the other. Accordingly, sequence insertions separating the two fused ORFs may contain codons specifying these amino acids. Alternatively, the linker peptide sequence may be designed to contain a specific secondary structure, such as an alpha helix, beta sheet, coiled coil or turn, or combinations thereof, which permit the two domains of the encoded randomized fusion polypeptide to be separated by a specific structure or combinations of specific structures.

Each ORF can be generated to contain conserved 5′ and 3′ flanking sequences that match those at the 5′ and 3′ ends of other ORFs in the starting collection. These sequences are not part of the natural ORF and allow the ORFs to be amplified, cloned, isolated, and/or joined to other ORFs or to pieces of vector DNA. The conserved 5′ and 3′ flanking sequences can contain restriction sites, recombination sites, or any other sequence that permits specific joining to other ORFs, to vector sequences, or other sequences aiding in the transfer of the randomized in-frame fusion polynucleotide into an organism, replication within that organism, stability in that organism, or expression within that organism.

The ORFs in the starting collection can be full-length ORFs or partial ORFs and can range in size from 15 nucleotides to 100,000 nucleotides.

The ORFs in the starting collection can be configured to allow them to be placed at the 5′ end of the resulting randomized in-frame fusion polynucleotide, or in the middle, or at the 3′ end, or randomly at either position. The conserved sequences at the ends of the ORFs can be designed to allow such specific or non-specific placement. The library of randomized in-frame fusion polynucleotides may contain the same collection of ORFs at the 5′ end, in the middle, and at the 3′ end, or distinct collections of ORFs at each position.

The randomized in-frame fusion polynucleotides can be generated by a variety of methods for joining or cloning DNA molecules known to those skilled in the art including, but not limited to, traditional cloning using restriction enzymes and DNA ligase (ligation-dependent cloning), agarose gel-free cloning, ligation-independent (or ligation-free) cloning, site-specific recombination, homology-dependent cloning, recombinational cloning, homology-dependent end joining, annealing of single-stranded ends, linker tailing, topoisomerase-catalyzed cloning, enzyme-free cloning, and others. “Joining nucleic acid molecules” as used herein refers to any method that results in the molecules being operably linked at room temperature. Such methods include, but are not limited to, covalent linkage (ligation), annealing of complementary strands of nucleic acid molecules and other ways of associating two or more nucleic acid molecules.

In a specific embodiment of the invention, homologous sequences at the ends of the 5′ and 3′ ORFs to be joined can be used to direct or mediate the joining event. A large number of methods exist that can be used to accomplish such homology-dependent assembly (Lobban 1973), including linker tailing (Lathe 1984) or derivatives thereof (da Costa 1998, Liu 2010), In-Fusion cloning (Zhu 2007, Irwin 2012), Sequence and Ligation-Independent Cloning (SLIC, Li 2007, Li 2012), FastCloning (Li 2011), Circular Polymerase Extension Cloning (Quan 2009, Quan 2011), the Gibson assembly method (Gibson 2009, Gibson 2010), Quick and Clean Cloning (Thieme 2011), and others (Vroom 2008). FIG. 2 shows an example of how homologous end sequence can direct construction of a precisely assembled circular molecule from three linear starting fragments.

Randomized in-frame fusion polynucleotides of this sort can impart new functions to an organism and change the organism's phenotype(s) in many different manners. To achieve such a change of phenotype, the library of randomized in-frame fusion polynucleotides is transformed into a target organism. The target organisms can be the source organism of some or all of the ORFs or ORF fragments used to make the randomized in-frame fusion polynucleotide library, or it can be a different organism. Target organisms include but are not limited to: E. coli, yeast, any species of bacteria, archaea, yeast, fungi, algae, cultured algal cells, insects, nematodes, vertebrates, animals, cultured animal cells, plants, or cultured plant cells. The target organism is generally an organism which is used for specific purposes, including, but not limited to, use in industry or agriculture, or in the production of chemicals, foods, fibers, structural materials, fuels, pharmaceuticals, agrochemicals, dyes, cosmetics or other useful substances.

Transformants of the target organism are generated which express members of the randomized in-frame fusion polynucleotide library. The transformants are be selected or screened for presence of the randomized in-frame fusion polynucleotides encoding the randomized fusion polypeptides, and allowed to express the polypeptides. The population of transformants is then selected or screened for any observable, selectable or measurable phenotype. Such phenotypes include, but are not limited to, changes or alterations in the following properties: growth rate; rate of cell division; generation time; size; color; texture; morphology; population density; productivity; yield; shape; growth habit; composition; metabolism; uptake or utilization of nutrients, minerals, salts, ions, toxins or water; photosynthetic efficiency; sensitivity to or resistance to abiotic stresses such as temperature, osmotic strength, salinity, pH, electromagnetic radiation, organic solvents, oxidation, oxidizing agents, detergents, drought, wind, desiccation, flood, nutrient limitation, starvation, oxygen limitation, light, pressure, compaction, shear or ionizing radiation; tolerance or resistance to biotic stresses such as diseases, pests, phages, viruses, infective agents, parasites or pathogens; appearance; reflective properties; fluorescent properties; refractivity; light-transmitting properties; electrical resistance, impedance or conductance; growth in the presence of specific nutrients; binding or adhesive properties; permeability; association or symbiosis with other organisms; pathogenicity; physical properties such as density, strength, hardness, brittleness, flexibility, rigidity, turgor pressure, electrical impedance, electrical resistance, electrical conductivity, magnetism, permeability, viscosity, color, texture or grain; behavior; response to environmental stimuli; expression of a polynucleotide; activity of an enzyme; rates of genetic or epigenetic change or mutation; ability to take up and/or integrate homologous or heterologous nucleic acid sequences; phenotypic diversity of a population; ability to be stained by dyes or compounds eliciting a change in color; resistance to antibiotics or toxins; resistance to penetration; quality of or production of products such as food, feed, fuel, fiber, structural materials, pharmaceutical compounds, cosmetics, dyes, chemicals, proteins, lipids, nucleic acids, fertilizers, feedstocks for the production thereof, or combinations or precursors thereof

Organisms expressing one or more specific randomized in-frame fusion polynucleotide can be re-transformed with the same library of randomized in-frame fusion polynucleotides, a similar library, or a different library, and the process of selecting or screening for altered properties of the organism repeated. In this manner, an iterative approach of transformation, selection, re-transformation, re-selection, etc. can be used to continue altering properties or phenotypes of the organism.

A randomized in-frame fusion polynucleotide can also be re-isolated from an organism transformed with the randomized in-frame fusion polynucleotide. The re-isolation can be done using any of a number of methods including, but not limited to, PCR amplification and plasmid rescue (Ward 1990, Dolganov 1993) followed by plasmid transformation into a laboratory organism such as E. coli. After re-isolation, it is possible to re-transform the randomized in-frame fusion polynucleotide into the same organism and/or a different organism to confirm that the randomized in-frame fusion polynucleotide reproducibly confers the same phenotype in repeated experiments.

In another embodiment of the invention, an iterative selection can be performed with the library of randomized in-frame fusion polynucleotides to arrive at sequentially smaller collections of in-frame fusion polynucleotides capable of conferring a phenotype of interest. FIG. 3 shows an example of such an iterative selection, in which a library of randomized in-frame fusion polynucleotides is introduced into an organism, selected for a phenotype of interest, and the plasmids encoding putative active in-frame fusion polynucleotides re-isolated from the population of selected organisms. This selected population of plasmids encoding putative active in-frame fusion polynucleotides conferring a phenotype of interest can now be re-introduced into the organism, or introduced into another organism, for a second round of phenotypic selections. After two or more rounds of selection, active in-frame fusion polynucleotides may be obtained that are different, or that have different activity, than those obtained in a single round of selection.

In another embodiment of the invention, an iterative selection can be performed with the library of randomized in-frame fusion polynucleotides in a manner that the 5′ and 3′ ORFs are recombined with each other to arrive at sequentially smaller collections of randomized in-frame fused polypeptides conferring a higher frequency of desirable phenotypes of interest and/or more desirable phenotypic values. FIG. 4 shows an example of such an iterative selection, in which a library of randomized in-frame fusion polynucleotides is introduced into an organism, selected for a phenotype of interest, and the plasmids encoding the in-frame fusion polynucleotides re-isolated from the population of selected organisms. The 5′ ORFs and 3′ ORFs are then isolated from the population of plasmids, for example by PCR amplification, and are used as starting sequence collections for construction of a new library of iterated in-frame fusion polynucleotides.

The resulting recombined or re-assembled library of in-frame fusion polynucleotides contains the sequences isolated from cells obtained by selection, and allows the sequences enriched during selection to be recombined with each other randomly. In the process it is possible that new combinations of in-frame fusion polynucleotides are formed that were not present in the starting library used for the first or previous round of selection. It is also possible that specific sequences are represented at different levels in the re-assembled library of in-frame fusion polynucleotides than in the original library. The re-assembled library can now be re-introduced into the organism, or introduced into another organism, for a second or subsequent round of phenotypic selections. After two or more rounds of selection, in-frame fusion polynucleotides may be obtained that are different, confer different phenotypes and/or phenotypic values, or that have different activity than those obtained in a single round of selection.

In the context of iterative selection of randomized in-frame fusion polynucleotides capable of conferring a phenotype of interest, specific sequences may be added to a collection of 5′ ORFs and 3′ ORFs, to be included in the re-assembled library or collection of iterated randomized in-frame fusion polynucleotides.

In another embodiment of the invention, error-prone PCR can be used during such iterative selection methods, specifically for re-amplification of 5′ ORFs and/or 3′ ORFs from plasmids isolated from a population of selected organisms, to introduce additional sequence diversity into polynucleotide sequences associated with a specific phenotype of interest. For example, use of lower-fidelity thermostable polymerases (Cline 1996, Biles 2004), or PCR-based incorporation of mutagenic nucleotide analogs such as 8-oxo-dGTP, dPTP, 5-bromo-dUTP, 2-hydroxy-dATP and dITP (Spee 1993, Kuipers 1996, Zaccolo 1996, Zaccolo 1999, Kamiya 2004, Kamiya 2007, Ma 2008, Petrie 2010, Wang 2012a) can be used to introduce random mutations into target sequences during PCR amplification, and to increase the sequence diversity of a starting sequence or population of sequences.

The iterative rounds of phenotypic selection as described above can be for the same phenotype, or the same phenotype at a different stringency of selection, or for a different phenotype. For example, when selecting for tolerance to a toxic compound, an initial selection can be performed at a concentration of the compound of 2%, which may be growth inhibitory but not lethal. One round of selection under such conditions may result in a selected population of organisms that exhibit a growth advantage at 2% of the selective agent. The in-frame fusion polynucleotides represented within this selected population of organisms can be re-introduced into the same organism or another organism, and subjected to either the same selection, or to a more stringent selection, for example at 3% of the compound. At this concentration, the toxic compound may be lethal to the wild-type organism. Introduction of the selected in-frame fusion polynucleotides, or of a new library of iterated and/or re-assembled in-frame fusion polynucleotides produced from the selected in-frame fusion polynucleotides, into the organism may result in a higher proportion of the organisms that are tolerant of or capable of growth in 3% of the toxic compound than would be the case if the initial selection had been performed at 3%. Alternatively, the selected in-frame fusion polynucleotides can be iterated and re-assembled to form a new library for introduction into the organism. Moreover, due to the enrichment for specific sequences that may have occurred during the first round of selection of in-frame fusion polynucleotides capable of conferring tolerance or resistance to the toxic compound, a different set of in-frame fusion polynucleotides may be represented after the second round of selection than would have been found if the selection had been performed in a single round at 3% of the toxic compound. In addition, this final set of in-frame fusion polynucleotides may contain polynucleotides that confer a higher level of tolerance or resistance to the toxic product than would have been found if the selection had been performed in a single round at 3% of the toxic product.

Iterative selections performed after introduction of libraries of randomized in-frame fusion polynucleotides into an organism, as described above and represented in FIG. 3, can be performed using 2 or more rounds of selection, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 rounds, or more, or any number in between.

Iterative selections performed after introduction of libraries of random in-frame fusion polynucleotides into an organism, accompanied by isolation of the component polynucleotides and their re-assembly into a new collection of random in-frame fusion polynucleotides as described above and represented in FIG. 4, can be performed using 2 or more rounds of selection, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 rounds, or more, or any number in between, with re-assembly of new collections of random in-frame fusion polynucleotides between 1 or more of these rounds of selection.

An organism expressing a randomized in-frame fusion polynucleotide and having an altered phenotype as a result of the randomized in-frame fusion polynucleotide can be used as a starting point for further phenotypic changes by transforming this organism again with a library of randomized in-frame fusion polynucleotides. The library of random in-frame fusion polynucleotides in the second round of improvement can be the same library that was used to generate the organism with an altered phenotype, or it can be a different library. Such iterative rounds of transformation of an organism with randomized in-frame fusion polynucleotide libraries and selection for phenotypes can result in multiple phenotypic changes, or phenotypic changes that are more profound than can be achieved with a single round of transformation and selection.

In another embodiment of this invention, a collection of organisms transformed with a library of randomized in-frame fusion polynucleotides is selected or screened for alterations in the expression of polynucleotide sequences, either homologous or heterologous to the organism, compared to control organisms transformed with empty vector sequences.

In another embodiment of this invention, a collection of organisms transformed with a library of randomized in-frame fusion polynucleotides is selected or screened for altered rates of genomic or genetic changes. These genomic and genetic changes include but are not limited to: point mutations; sequence insertions, deletions, or inversions; repeat copy number variation; chromosomal translocations; chromosome crossovers; gene conversion; alterations in the distribution, prevalence, position or expression of transposons; uptake of foreign nucleic acid sequences; integration of foreign nucleic acid sequences; or combinations thereof resulting in complex sequence changes and genome rearrangements.

In yet another embodiment of this invention, a collection of organisms transformed with a library of randomized in-frame fusion polynucleotides is selected or screened for higher yield of a material or compound produced by the organism.

In a further embodiment of the invention, a collection of organisms transformed with a library of randomized in-frame fusion polynucleotides is selected or screened for the absence of genetic checkpoints that limit the growth rate, productivity or other properties of the cell or organism. In particular, this allows isolation of organisms with constitutive production of a material or compound that is naturally produced only in certain physiological or growth states, or is produced at maximal levels only in certain physiological or growth states.

In another embodiment of the invention, a collection of organisms transformed with a library of randomized in-frame fusion polynucleotides is selected or screened for altered activity or specificity of enzymes or biochemical pathways expressed by the cell.

In a still further embodiment of the invention, the collection of randomized in-frame fusion polynucleotides is made by randomly fusing one or a small number of polynucleotides of interest with a larger collection of polynucleotides. In this manner it is possible to create a collection of variants or mutants of the polynucleotides of interest, which can be screened for specific properties. In particular, in this manner it is possible to screen for enzymes with higher activity, altered activity, altered temperature optimum, altered pH optimum, resistance to high temperatures or extreme pHs, resistance to acids or bases, resistance to desiccation, resistance to organic solvents, resistance to high salt concentrations, resistance to proteases, or other desirable properties of an enzyme.

EXAMPLES Example 1

Isolation of Randomized In-Frame Fusion Polynucleotides Capable of Conferring Heat, Salt, Ethanol and Butanol Tolerance to Saccharomyces cerevisiae

Product tolerance traits of production microbes are important factors that contribute to maximal yields and titers of fermentation products (Ding 2009, Jia 2009, Dunlop 2011). Ethanol and butanol are industrial products that are toxic and therefore the subject of various efforts aimed at increasing the tolerance of yeast cells to these alcohols. Butanol is featured as a target of this example because it is representative of medium-chain fuels and chemicals, many of which have high toxicity and whose production is being attempted and optimized in microbes (Dunlop 2011, Jang 2012, Lee 2012). Butanol is a chemical feedstock used for the production of many other chemicals (Mascal 2012). Tolerance to heat, salt and low pH are also industrially relevant properties of production organisms as many production systems generate heat, are conducted in an environment containing salts (i.e. NaCl) or an otherwise high-osmotic environment, or are conducted at or generate low pH.

Producing a Saccharomyces cerevisiae Collection of Randomized in Frame Fusion Polynucleotides

A collection or library of Saccharomyces cerevisiae in-frame fusion polynucleotides is prepared as described in U.S. patent application Ser. No. 14/134,619 and International Patent Application Serial Number PCT/US13/76526. The randomized in-frame fusion polynucleotides are cloned into a vector molecule, such as a p416-GAL1 derivative. This p416-GAL1 derivative vector is derived from the yeast centromeric plasmid p416-GAL1 (Funk 2002), which contains the following sequences for plasmid propagation in yeast and E. coli and expression of an inserted polynucleotide: the bacterial replicon of plasmid pMB1, the bacterial ampicillin/carbenicillin-resistance gene, the yeast CEN6/ARSH4 cassette (Sikorski 1989) containing the chromosome 6 centromere and the yeast histone H4-associated autonomously replicating sequence (ARS), the yeast URA3 prototrophic marker gene, and the yeast GAL1 promoter and CYC1 terminator placed adjacent to each other in a manner that allows insertion of coding regions in between to allow their expression. The sequence of this p416-GAL1 derivative is given in SEQ ID NO:127. All randomized in-frame fusion polynucleotides are cloned between nucleotide numbers 3206 and 3207 of SEQ ID NO:127. The vector is PCR amplified using oligonucleotides PG0085A (SEQ ID NO:147) and PG0088A (SEQ ID NO:148) for use in assembly of the randomized in-frame fusion polynucleotide collection.

Each of the 5′ ORFs prepared for the randomized in-frame fusion polynucleotide collection is flanked by the conserved sequence SEQ ID NO:139 at its 5′ end and by the conserved sequence SEQ ID NO:140 at its 3′ end. For re-assembly of 5′ ORFs into new randomized in-frame fusion polynucleotide collections (described below), pools of 5′ ORFs are PCR amplified using oligonucleotides PG0085 (SEQ ID NO:143) and PG0100 (SEQ ID NO:144).

Each of the 3′ ORFs prepared for the randomized in-frame fusion polynucleotide collection is flanked by the conserved sequence SEQ ID NO:141 at its 5′ end and by the conserved sequence SEQ ID NO:142 at its 3′ end. For re-assembly of 3′ ORFs into new randomized in-frame fusion polynucleotide collections (described below), pools of 3′ ORFs are PCR amplified using oligonucleotides PG0101 (SEQ ID NO:145) and PG0088 (SEQ ID NO:146).

Sequence Amplification General Method

All PCR amplifications are performed using the following method.

The two amplification primers, each at a final concentration of 1.2 μM, are combined with 10 ng of template DNA, PCR buffer and thermostable polymerase in a total reaction volume of 50 μl. A high-fidelity thermostable polymerase such as Phusion™ Hot Start II thermostable high-fidelity polymerase (Thermo Scientific) can be used. For Phusion™ polymerase, the 5× HF amplification buffer supplied with the enzyme is used for all amplifications. All amplifications are performed on T100 thermal cyclers (Bio-Rad Laboratories) containing 96-well blocks. The deoxynucleotide triphosphates (dNTPs) used in all amplifications are a stock containing 10 mM of each dNTP, also obtained from Thermo Scientific. Deionized water is used in all reactions and to make all solutions not supplied with the polymerase. PCR amplicons are generated by denaturing at 95° C. for 2 minutes followed by 10-35 cycles of: 20 seconds at 95° C., 20 seconds at 60° C., and 1 min/kb at 72° C. (but a minimum of 30 seconds at 72° C.). The efficiency of formation of the PCR product is measured by agarose electrophoresis or by fluorescent spectroscopy using a fluorometer, such as a Qubit® fluorometer (Life Technologies). Successful PCR reactions can be purified using silica resins suitable for DNA purification. Unsuccessful reactions are repeated by varying the Mg⁺² concentrations in the PCR reaction and/or other reaction conditions. Following successful amplification of each ORF, the concentration of each PCR product is normalized, and products corresponding to specific size ranges are pooled.

All PCR amplifications follow the same general procedure:

1. A PCR mix as described below is prepared for each stage of the PCR reaction, and is kept cold until inserted into the thermal cycler.

2. The samples are mixed thoroughly and then centrifuged at 4000 rpm for 1 minute to bring the reaction contents to the bottom of the tube or well in a plate.

3. The plates or tubes are inserted into a thermal cycler.

Yeast Transformation

Yeast transformations are performed by the lithium acetate—heat shock method (Gietz 2002, Gietz 2006, Gietz 2007). The yeast strain BY4741 (Brachmann 1998) from a plate or an overnight culture is inoculated into 50 ml of YPD medium (20 g Bacto Peptone, 10 g Bacto Yeast Extract and 20 g Glucose per liter) at 30° C. on a shaker at 225 rpm from a starting density of 5×10⁶ cells/ml, and grown over several hours to a final cell density of 2×10⁷ cells/ml. The cells are harvested by centrifuging at 3000 g for 5 min, are then resuspended in 25 ml of sterile deionized water, centrifuged again. Cells are resuspended in 1 ml of sterile water, transferred to a 1.5 ml microcentrifuge tube, centrifuged for 30 sec at 3000 rpm and the supernatant aspirated. The cell pellet is then resuspended in 0.4 ml of sterile deionized water. The cell suspension is combined with 3.26 ml of transformation mix (2.4 ml of 50% w/v PEG 3350, 360 μl 1 M Lithium acetate and 500 μl 10 mg/ml sheared, boiled salmon sperm DNA) and mixed well. Aliquots of DNA (100 ng-1 μg) are pipetted into separate 1.5 ml microcentrifuge tubes and combined with 380 μl of the cell suspension in transformation mix. The cell/DNA mixture is mixed thoroughly and is incubated at 42° C. on a shaker at 250 rpm for 40 minutes. The transformations are then centrifuged for 1 minute at 3000 rpm in a microcentrifuge, the supernatant aspirated and each cell aliquot resuspended in 0.5-1 ml sterile deionized water. Depending on the desired density of colonies, 10 μl to 1 ml of the cell suspension are plated with sterile 4 mm glass beads onto one 10 cm or 15 cm plate containing synthetic complete uracil dropout medium having glucose as a carbon source (for 1 L, 6.7 g yeast nitrogen base, 0.77 g uracil dropout mix, 15 g Bacto agar, 120 μl 10 N NaOH to bring the pH to 5.6-5.8, and 20 g glucose). After drying, the plates are covered and incubated at 30° C. or at a selective temperature for several days until colonies of transformants have formed.

Screening for In-Frame Fusion Polynucleotides Conferring Heat, Salt, Low pH, Ethanol and Butanol Tolerance

After formation of colonies or lawns of cells transformed with randomized in-frame fusion polynucleotides, the transformed cells are removed from the selective plates by scraping with glass beads. This is done by adding to each 10 cm plate 5 ml synthetic complete uracil dropout medium with galactose as a carbon source (for 1 L, 6.7 g yeast nitrogen base, 0.77 g uracil dropout mix, 120 μl 10 N NaOH to bring the pH to 5.6-5.8 and, added after autoclaving, 100 ml sterile-filtered 20% galactose) together with 10×4 mm glass beads. Proportionally higher volumes of medium are added to larger plates. Using swirling and horizontal shaking motions to allow the glass beads to dislodge the yeast cells from the surface of the agar, the resuspended cells are collected with a pipet, using additional medium if desired to wash any remaining cells off the plate. Cells collected in this fashion are pelleted by centrifugation at 4000 rpm for 5 minutes. Cells are resuspended in synthetic complete uracil dropout medium with galactose as a carbon source at a cell density of 5×10⁶ cells/ml and cultured at 30° C. shaking at 250 rpm for 4-12 hours. This pre-culturing step allows induction of the GAL1 promoter used to express the randomized in-frame fusion polynucleotides.

For heat tolerance selection, cells are plated on synthetic complete uracil dropout medium with galactose as a carbon source. The cells are spread on the plate using 10-15 4 mm sterile glass beads. After drying, the plates are incubated at 30° C. for 24 hours followed by incubation at 40° C. for four days. Individual colonies able to resist the high temperature are visible 5 days after plating.

Alternatively, heat tolerant cells are selected in liquid culture. Population of yeast transformants containing in-frame fusion polynucleotides are suspended in synthetic complete uracil dropout medium with galactose as a carbon source at a cell density of 5×10⁶ cells/ml in 50 ml of medium in a 500 ml flask and cultured at 40-42° C. shaking at 250 rpm for 7 days.

For selection of randomized in-frame fusion polynucleotides conferring salt tolerance, cells are pre-cultured in synthetic complete uracil dropout medium with galactose as a carbon source, and are then plated on synthetic complete uracil dropout medium with galactose as a carbon source and containing 1 M NaCl (for 1 L, 6.7 g yeast nitrogen base, 0.77 g uracil dropout mix, 120 μl 10 N NaOH to bring the pH to 5.6-5.8, 15 g Bacto Agar, 58.44 g NaCl and, added after autoclaving, 100 ml sterile-filtered 20% galactose). The cells are spread on the plate using 10-15 4 mm sterile glass beads. After drying, the plates are incubated at 30° C. for five days. Individual colonies able to resist the high salt are visible 5 days after plating.

Alternatively, salt tolerant cells are selected in liquid culture. Populations of yeast transformants containing randomized in-frame fusion polynucleotides are suspended in synthetic complete uracil dropout medium with galactose as a carbon source and containing 1.5-2 M NaCl, at a cell density of 5×10⁶ cells/ml in 50 ml of medium in a 500 ml flask, and cultured at 30° C. with shaking at 250 rpm for 7 days.

For selection of randomized in-frame fusion polynucleotides conferring low pH tolerance, populations of yeast transformants containing randomized in-frame fusion polynucleotides are suspended in synthetic complete uracil dropout medium with galactose as a carbon source and containing 0.1 M sodium acetate pH 3.0 (prepared by mixing 2.74 ml glacial acetic acid with 0.11 g anhydrous sodium acetate in 50 ml final volume in a 50 ml tube and filter sterilized by filtering through a 0.2 micron filter), at a cell density of 5×10⁶ cells/ml in 50 ml of medium in a 500 ml flask and cultured at 30° C. with shaking at 250 rpm for 7 days.

For selection of randomized in-frame fusion polynucleotides conferring ethanol tolerance, populations of yeast transformants containing randomized in-frame fusion polynucleotides are suspended in synthetic complete uracil dropout medium with galactose as a carbon source and containing 15% ethanol, at a cell density of 5×10⁶ cells/ml in 50 ml of medium in a 500 ml flask and cultured at 30° C. with shaking at 250 rpm for 7 days.

For selection of randomized in-frame fusion polynucleotides conferring butanol tolerance, populations of yeast transformants containing randomized in-frame fusion polynucleotides are suspended in synthetic complete uracil dropout medium with galactose as a carbon source and containing 3% butanol, at a cell density of 5×10⁶ cells/ml in 50 ml of medium in a 500 ml flask and cultured at 30° C. with shaking at 250 rpm for 7 days.

Alternative conditions can also be used for selection of randomized in-frame fusion polynucleotides conferring tolerance to heat, salt, low pH, ethanol and butanol. For most yeast strains the following conditions are growth inhibitory and can be used for selections: temperatures at or above 39° C.; salt concentrations above 1.25 M; pH values at or below pH 4.0; ethanol concentrations above 9%; and butanol concentrations above 1.5%.

After completing selections in liquid medium for heat, salt, low pH, ethanol and/or butanol tolerance, the selected cultures are transferred to 50 ml centrifuge tubes, pelleted by centrifugation for 5 minutes at 4000 rpm, and the supernatant decanted. The cell pellet is resuspended in 0.2-1.0 ml of synthetic complete uracil dropout medium containing glucose as a carbon source (volume dependent on the size of the cell pellet), and aliquots of the cell suspension plated on synthetic complete uracil dropout medium containing glucose as a carbon source (for 1 L, 6.7 g yeast nitrogen base, 0.77 g uracil dropout mix, 15 g Bacto agar, 120 μl 10 N NaOH to bring the pH to 5.6-5.8, and 20 g glucose). After drying, the plates are incubated at 30° C. for 2-3 days. Colonies arising on the plates are then processed in the same manner as colonies arising on solid selective media (described for heat and salt tolerance selections above) before further manipulation.

Iterative Selection of Randomized in Frame Fusion Polynucleotides Conferring Heat, Salt, Low pH, Ethanol and Butanol Tolerance

To enrich for randomized in-frame fusion polynucleotides that confer the most desirable possible phenotypic values of a desirable trait, it is often useful to perform iterative selections. This procedure allows for gradual enrichment of randomized in-frame polynucleotides conferring tolerance to various abiotic stresses, and for isolation of in-frame fusion polynucleotides containing the best combinations of sequences for conferring tolerance phenotypes or other phenotypes of interest. Successive cycles of yeast transformation, selection, plasmid extraction, cloning into E. coli and plasmid purification will result in populations of randomized in-frame fusion polynucleotides that effectively confer a trait of interest. This cycle of manipulations can be performed efficiently with populations of polynucleotides as opposed to individual isolates that are all maintained separately.

Starting with yeast cells recovered from colonies grown from transformants containing randomized in-frame fusion polynucleotides that survived selections with heat, salt, low pH, ethanol or butanol, iterative selections are performed as follows.

The cells are collected by centrifugation and plasmid DNA is purified using a commercial kit (for example the Zymoprep™ yeast plasmid miniprep kit from Zymo Research), following the manufacturer's instructions. The resuspended DNA is introduced into the DH10B (Life Technologies) or EC100 (Epicentre Technologies) strain of E. coli by electroporation. 1 μl DNA is combined with 25 μl electrocompetent cells on ice, transferred into a 1 mm gap size electroporation cuvette, and electroporated at 1.5 kV using a Bio-Rad MicroPulser™ electroporator. Cells are suspended in 0.5 ml LB broth, allowed to recover for 1 hour at 37° C. on a shaker and transformed cells plated in 0.2 ml aliquots onto plates containing LB agar medium with 50 μg/ml carbenicillin.

Bacterial colonies or lawns arising on the plate are removed from the plate by scraping with glass beads as described above, the cells are pelleted by centrifugation, and plasmid DNA is purified from them using standard methods (Sambrook 1989).

The recovered, purified plasmid DNA is then re-introduced into yeast by lithium acetate—heat shock transformation as described above (Gietz 2006). Colonies or lawns of cells arising from the re-introduction transformation are scraped from the cells with glass beads as described above, and the cells are suspended in minimal synthetic complete uracil dropout medium containing galactose as a carbon source. The cells are then used for another round of selections for heat, salt, ethanol, butanol of low pH tolerance, or selections for other desirable phenotypes.

Iterative Selection of Randomized in Frame Fusion Polynucleotides Conferring Heat, Salt, Low pH, Ethanol and Butanol Tolerance by Re-Amplifying and Re-Assembling Polynucleotides Represented in the Selected Population

As noted above, iterative selections for heat and salt tolerance can be performed by serial rounds of transforming yeast with collections of randomized in-frame fusion polynucleotides contained within plasmids, selecting for heat or salt tolerance, extracting the plasmid DNA from surviving cells and reintroducing the extracted plasmid DNA into E. coli and purifying plasmid from the resulting E. coli transformants. However, improved results can be obtained with additional steps of re-amplifying the 5′ ORFs and 3′ ORFs represented in the survivors of the previous round of selection and re-assembling these re-amplified ORFs into a new collection of randomized in-frame fusion polynucleotides, which is then introduced into yeast and the resulting transformants used for one or more additional rounds of selection.

The randomized in-frame fusion polynucleotide plasmid DNA isolated from cells/colonies surviving a selection for heat, salt, low pH, ethanol or butanol tolerance, or complete genomic DNA isolated from the same cells, is used to re-amplify the 5′ ORFs and 3′ ORFs present in the randomized in-frame fusion polynucleotides. The amplification is performed as described above, using oligonucleotides PG0085 (SEQ ID NO:143) and PG100 (SEQ ID NO:144) for amplifying the 5′ ORFs, and using oligonucleotides PG0101 (SEQ ID NO:145) and PG0088 (SEQ ID NO:146) for amplifying the 3′ ORFs. Optionally, mutations can be introduced into the 5′ ORFs and 3′ ORFs in the course of PCR amplification, using either lower-fidelity thermostable polymerases (Cline 1996, Biles 2004), or PCR-based incorporation of mutagenic nucleotide analogs such as 8-oxo-dGTP, dPTP, 5-bromo-dUTP, 2-hydroxy-dATP and dITP (Spee 1993, Kuipers 1996, Zaccolo 1996, Zaccolo 1999, Kamiya 2004, Kamiya 2007, Ma 2008, Petrie 2010, Wang 2012a). The re-amplified 5′ ORFs and 3′ ORFs are electrophoresed on a 1.5% agarose gel to remove amplification products below 200 bp in size and are purified from the gel using a commercial kit. Thep416-GAL1 vector DNA is also re-amplified and purified after electrophoresis.

Re-Assembly of Re-Amplified ORFs into a Random in Frame Fusion Polynucleotide Collection

The purified, re-amplified 5′ ORFs and 3′ ORFs are re-assembled with the p416-GAL1 vector DNA as described below and are introduced into E. coli as a new collection of randomized in-frame fusion polynucleotides, and the resulting plasmid DNA is purified using a commercially available plasmid purification kit, following the manufacturer's recommendations. This is done using a ligation-independent cloning method. The following single-tube procedure uses a single-stranded exonuclease to create single-stranded tails at the ends of the DNA molecules to be assembled, followed by annealing of the homologous ends and fill-in of the remaining single-stranded regions. The purified DNA fragments resulting from PCR amplification of the expression vector backbone and 5′ and 3′ ORFs (see above) are combined in roughly equimolar amounts for a total of approximately 100 ng DNA in a 10 μl reaction. A 10× assembly buffer (500 mM Tris-HCl pH 8.0, 100 mM MgCl₂, 100 mM β-mercaptoethanol, 1 mM each of the 4 dNTPs) is added to produce a 1× concentration. Also added to the reaction are 0.01 unit of a single-stranded exonuclease and 1 unit of a thermostable, high-fidelity hot-start polymerase such as Phusion™ polymerase. Hot start implies that at physiological temperatures the polymerase is in an inactive form, for example being bound by an antibody or other compound, preventing it from competing with the exonuclease for binding to DNA ends in the early stages of the reaction. The reaction volume is adjusted to 10 μl, the reaction is mixed gently and incubated at 37° C. for 5 minutes allowing the exonuclease to act on the DNA ends. The temperature is then raised to 50-60° C. to inactivate the exonuclease and activate the polymerase while promoting annealing of single-stranded ends of the DNA molecules. The mixture is incubated at 50-60° C. for 30 minutes and the temperature is then reduced to 4° C. to stop the reaction. The reaction can be performed in a PCR machine for efficient temperature changes. After completion, the reaction mixture can be stored at −20° C. and is ready to be transformed into competent Saccharomyces cerevisiae as described above.

Exonucleases that are suitable for this procedure are T4 DNA polymerase, Exonuclease III, lambda exonuclease, T5 exonuclease or T7 exonuclease. Exonucleases with 5′ to 3′ activity directionality (i.e. T4 polymerase, lambda exonuclease, T5 exonuclease or T7 exonuclease) are preferred as they result in higher numbers of base pairs of annealed sequence between the two nicks at each cloning junction, thus stabilizing the desired product. The reaction may also be supplemented with polyethylene glycol (molecular weight 4000-10000) at a final concentration of 5-10% to promote annealing of single-stranded DNA ends if desired.

After production of new re-assembled randomized in-frame polynucleotide collections, populations of yeast cells transformed with these collections are exposed to the same selective conditions to select cells containing plasmids and polynucleotides conferring heat, salt, low pH, ethanol, and butanol tolerance. Selections performed on cell populations containing collections of re-assembled randomized in-frame fusion polynucleotides may contain randomized in-frame fusion polynucleotides with different sequence combinations, or randomized in-frame fusion polynucleotides with sequence combinations at different frequencies or concentrations, compared to the initial randomized in-frame fusion polynucleotide collection, or compared to smaller populations of randomized in-frame fusion polynucleotides selected directly from the initial collection as described above. The sequence combinations formed by the reassembly process may confer better protection against abiotic stresses and selective agents, resulting in more desirable phenotypic values of transformants containing such sequence combinations.

The phenotypic values conferred by individual in-frame fusion polynucleotides isolated from randomized in-frame fusion polynucleotide collections using any of the selection methods described above, can be measured and compared between different transformants to find randomized in-frame fusion polynucleotides conferring the highest level of protection.

Testing of Individual Randomized in Frame Fusion Polynucleotides for Conferral of Heat, Salt, Ethanol, Butanol and Low pH Tolerance in Cell Survival Assays

Cell survival assays are performed to allow comparative testing of individual, randomized in-frame fusion polynucleotides isolated from colonies of cells grown from survivors of heat, salt, low pH, ethanol or butanol selections as described above. The procedure replies extensively on nucleic acid and cell transfers that occur from 96-well plate to 96-well plate, retaining the original order of the plasmids of the first plate used for cultivation of E. coli transformants and plasmid DNA prepping.

Plasmid DNA isolated from individual yeast colonies is transformed into competent E. coli cells and plated at low cell densities onto LB agar plates containing 50 μg/ml carbenicillin, allowing individual E. coli colonies to grow. Individual E. coli colonies are transferred to 96-well deep-well plates, each well containing 1 ml LB containing 50 μg/ml carbenicillin and grown over night at 37° C. Certain wells in the plate are reserved for cells transformed with control plasmids that either lack an insert or contain inserts known not to confer abiotic stress tolerance. After incubation, the randomized in-frame fusion polynucleotide plasmid DNA is isolated from all clones in the plate.

The isolated, purified randomized in-frame fusion polynucleotide plasmid DNA is then re-introduced into yeast by lithium acetate—heat shock transformation as described above but using a 96-well plating format so that each transformation is plated separately into a well of a 2 ml deep-well plate containing synthetic complete uracil dropout medium with glucose as a carbon source. The transformants are allowed to grow under selection for three days at 30° C.

Cells are removed from the 96-well transformation plate by adding 500 μl synthetic complete uracil dropout medium with glucose as a carbon source and shaking the plate on a microshaker at 1000 rpm for 30 minutes. Aliquots of 250 μl of each cell suspension are then added to a fresh 2 ml deep-well 96-well plate containing in each well 500 μl of synthetic complete uracil dropout medium with galactose as a carbon source, and grown over night at 30° C. and shaking at 1000 rpm. This culturing step generates sufficient cell numbers for all subsequent assays while simultaneously exhausting the glucose in the growth medium and allowing induction of the GAL1 promoter from which the fusion genes are expressed.

Cell densities are determined by hemocytometer counting for 6 overnight cultures (6 different wells) per plate and averaged. This average is used to calculate a transfer volume of the cell suspensions for addition to 1.3 ml fresh medium, to result in a final cell density of 10⁷ cells/ml. The calculated volume of suspended cells is added from each culture (each well in the plate) to a fresh 96-well plate having 1.3 ml per well of YPGal rich medium with galactose as a carbon source (20 g Bacto Peptone, 10 g Bacto Yeast Extract and 20 g galactose per liter). This YPGal starter plate is grown for four hours at 30° C., with shaking at 1000 rpm.

Five 96-well 2 ml deep-well plates serve as selective culture plates and are inoculated from the YPGal starter plate, by adding 025 ml YPGAL starter culture from each well of the starter plate to 0.25 ml of 2× selective medium in each well of the selective culture plate. The 2× selective media used in the five selective culture plates are: 1) YPGal to select for heat tolerance by incubation at 42° C., 2) YPGal+4 M NaCl to select for salt tolerance; 3) YPGal+0.2 M sodium acetate pH 3.0 to select for low pH tolerance; 4) YPGal+30% ethanol to select for ethanol tolerance and 5) YPGal+6% butanol to select for butanol tolerance. The final concentrations of the selective agents are as follows: NaCl: 2 M; low pH: 0.1 M sodium acetate pH 3.0; ethanol: 15%; butanol: 3%.

The selective plates are incubated at 30° C. (42° C. for the heat selection), with shaking at 1000 rpm for 72 hours. The plates are then removed from the shaker and each culture spotted, without dilution and at a 1:10 dilution, in 3 μl spots onto 15 cm plates containing YPD agar (for 1 L, 20 g Bacto Peptone, 10 g Bacto Yeast Extract, 20 g glucose and 15 g Bacto Agar) using a Bel-Art Products Bel-Art 96-well replicating tool. The plates are incubated at 30° C. for 48 hours.

The density of the cells growing on each pair of undiluted-diluted spots, indicates the number of surviving cells in each culture. Spots are scored on a scale from 0 to 3, 0 being no growth, 1 slight growth, 2 significant growth and 3 confluent growth; both spots, resulting from the two dilutions, are taken into account to generate the score. A panel of plate images, with results of the cell survival assay for 16 randomized in-frame fusion polynucleotides, is shown in FIG. 5. Each randomized in-frame polynucleotide fusion is scored in triplicate in this manner, scores added, and the average background subtracted to generate the final score. All scores are tabulated and are shown in Table 1.

Resistance and tolerance to ethanol and butanol of the 63 yeast transformants are also measured in minimal media containing raffinose and galactose as carbon sources. The 63 strains are compared to a negative control strain transformed with the empty p416-GAL1 plasmid. To perform these experiments, the 63 strains plus control strain are first grown in 96 well plates containing minimal uracil dropout medium containing 2% raffinose as a carbon source for 6 hours at 30° C. with constant shaking at 200 rpm, following which the fusion-genes are induced for expression with galactose at a final concentration of 2% and incubation continued overnight. Subsequently, 0.1 OD cultures from each well are inoculated into fresh minimal uracil dropout medium containing 1% raffinose and 2% galactose, but containing different concentrations of ethanol and butanol. Four concentrations of ethanol (8, 11, 14 and 17% v/v) and n-butanol (2, 2.5, 3 and 3.5% v/v) are used. The 96 well culture plates are covered with air-permeable sealing films, and all plates are together further sealed in a large airtight plastic bag. This creates a semi-aerobic condition, and the cultures are incubated in a shaking incubator at 30° C. for 3 days. Two dilutions of each culture (1:10 and 1:100) are spotted on minimal media containing glucose as a carbon source using a Bel-Art Products Bel-Art 96-well replicating tool. The plates are incubated at 30° C. for 2 days. An image is taken of each plate and is scored for each dilutions for each fusion gene. A score of 0 to 5 is given to each spot based on growth of each spot compared to the negative control strain containing the empty vector. For each randomized in-frame fusion polynucleotide construct and for each dilution, the score is multiplied by its corresponding concentration of butanol/ethanol, and averaged. The same scoring method is followed for the strain with the negative control plasmid on the same plate, providing the background score. The final score is obtained by subtracting the background score strain from the average score of the strain with the individual randomized in-frame fusion polynucleotides.

The scores in Table 1 can be considered phenotypic values of each randomized in-frame fusion polynucleotide for each selection imposed on the transformants. High scores represent high phenotypic values for the corresponding randomized in-frame fusion polynucleotide.

Two types of randomized in-frame fusion polynucleotides are represented in Table 1. Seven unique in-frame fusion polynucleotides (M25-E1, M25-F4, M25-G8, M25-G10, M25-H11, M26-A12 and M26-D6) were selected by 1 round of direct selection, followed by 1 round of PCR amplification of the 5′ and 3′ ORFs and their re-assembly, followed by another round of direct selection. These 7 randomized in-frame fusion polynucleotides are referred to as “Re-assembled.” The remaining 56 unique in-frame fusion polynucleotides were selected by two iterative rounds of selection. These 56 in-frame fusion polynucleotides are referred to as “Directly selected.”

Average activities for the 7 selection categories shown in Table 1 are computed separately for the two classes of randomized in-frame fusion polynucleotides. The average scores are shown at the bottom of the table. In 6 out of 7 cases the average scores for the reassembled randomized in-frame fusion polynucleotides are higher than those of the directly selected randomized in-frame fusion polynucleotides. This data indicates that a reassembly step can be advantageous for isolating in-frame fusion polynucleotides conferring high phenotypic values.

Characterization of Positive Randomized in Frame Fusion Polynucleotides and Additional Screens

Randomized in-frame fusion polynucleotide expression constructs conferring the most dramatic or broad phenotypes are sequenced to identify the randomized in-frame fusion polynucleotides. The results are tabulated and the best randomized in-frame fusion polynucleotides chosen for future work. Table 2 shows the identities of the open reading frames found in all 63 yeast in-frame fusion polynucleotides. Their sequences (nucleic acid and protein) are contained in SEQ IDs NO.: 1-126.

All resistance and tolerance scores are listed in Table 1 below.

TABLE 1 Resistance and tolerance activities of 63 randomized in-frame fusion polynucleotides in S. cerevisiae Fusion Fusion polynucleotide polynucleotide Activity scores (rich medium) Activity scores nucleic encoded pH 3 (minimal Fusion acid protein (0.1M Salt medium) polynucleotide sequence sequence Heat Ethanol Butanol sodium (2M Butanol Ethanol name SEQ ID SEQ ID (42° C.) (15%) (3%) acetate) NaCl) (3%) (15%) Y1-5A 1 64 3.75 0.00 4.75 2.75 4.75 0.0 35.0 Y1-7A 2 65 3.75 1.75 7.75 7.75 7.75 0.0 3.5 Y1-9A 3 66 0.00 1.75 4.75 4.75 4.75 0.0 0.8 Y1-13A 4 67 5.75 2.75 10.75 6.75 6.75 0.0 10.5 Y1-17A 5 68 6.75 2.75 7.75 8.75 7.75 0.0 7.0 Y1-18A 6 69 0.00 0.00 1.75 2.75 3.75 0.0 0.0 Y1-19A 7 70 0.75 5.75 7.75 8.75 7.75 0.0 9.8 Y1-20A 8 71 6.75 0.00 3.75 2.75 2.75 0.0 23.8 Y1-21A 9 72 5.75 0.00 4.75 5.75 7.75 0.0 0.0 Y1-23A 10 73 6.75 0.00 4.75 5.75 4.75 0.0 0.8 Y1-25A 11 74 0.75 0.00 1.75 2.75 1.75 0.0 7.0 Y1-28A 12 75 3.75 2.75 5.75 7.75 6.75 0.0 17.5 Y1-33A 13 76 6.75 0.00 0.00 1.75 4.75 0.0 0.0 Y1-34B 14 77 6.75 5.75 7.75 8.75 7.75 0.0 14.0 Y1-38A 15 78 6.75 2.75 7.75 8.75 4.75 0.0 7.0 Y1-39B 16 79 6.75 3.75 7.75 6.75 5.75 0.0 14.0 Y1-40A 17 80 5.75 0.00 1.75 5.75 4.75 0.0 7.0 Y1-43A 18 81 6.75 5.75 6.75 8.75 7.75 0.0 14.0 Y1-45A 19 82 6.75 0.00 0.00 0.00 1.75 0.0 0.0 Y1-47A 20 83 8.75 5.75 4.75 4.75 4.75 0.0 14.8 Y1-48A 21 84 6.75 5.75 4.75 8.75 7.75 0.0 10.5 Y1-49A 22 85 1.75 0.00 0.00 0.00 1.75 0.0 0.0 Y1-58B 23 86 0.00 0.00 0.00 0.00 0.00 0.0 0.0 Y1-58C 24 87 0.00 0.00 0.00 0.00 0.00 0.0 0.0 Y1-66C 25 88 0.00 0.00 0.00 0.00 0.00 0.0 0.0 Y1-67B 26 89 0.75 0.00 0.00 0.00 2.75 3.0 0.0 Y2-28A 27 90 0.75 0.00 0.00 0.00 0.00 0.0 0.0 M21-A02 28 91 3.75 2.75 4.75 7.25 7.75 4.5 0.0 M21-A03 29 92 6.75 5.75 10.75 11.75 7.75 16.0 0.0 M21-A04 30 93 6.75 0.00 4.75 5.75 4.75 4.5 0.0 M21-A09 31 94 5.75 2.75 4.75 4.75 3.75 4.5 2.8 M21-C08 32 95 6.75 2.75 6.25 5.75 4.75 4.5 2.8 M21-D06 33 96 7.75 0.00 4.00 3.25 9.75 1.0 0.0 M22-C01 34 97 6.75 3.00 7.00 9.25 9.75 16.4 0.0 M22-C05 35 98 6.75 3.00 8.00 6.25 6.75 2.5 0.0 M22-D01 36 99 4.75 3.00 7.00 6.25 9.75 3.0 0.0 M23-C03 37 100 9.75 9.00 10.00 9.25 9.75 11.8 0.0 M23-D02 38 101 0.00 0.00 0.00 3.25 6.75 0.0 0.0 M23-D09 39 102 0.00 9.00 10.00 12.25 12.75 6.5 0.0 M23-E02 40 103 9.75 1.00 5.00 6.25 6.75 3.5 0.0 M23-F02 41 104 9.75 9.00 10.00 12.25 12.75 1.6 0.8 M23-H01 42 105 2.75 0.00 3.00 6.25 6.75 0.0 0.0 M24-A05 43 106 0.75 3.00 7.00 6.25 9.75 3.5 13.0 M24-B12 44 107 3.75 0.00 3.00 6.25 9.75 4.5 4.3 M24-D11 45 108 0.00 3.00 10.00 6.25 3.75 18.3 0.0 M24-E05 46 109 0.00 6.00 7.00 12.25 12.75 14.9 0.0 M24-F06 47 110 6.75 6.00 7.00 9.25 9.75 7.8 0.0 M25-E1 48 111 9.75 3.00 4.00 6.25 9.75 4.0 0.0 M25-F4 49 112 9.75 0.00 0.00 6.25 6.75 2.5 0.0 M25-G8 50 113 9.75 0.00 1.00 6.25 6.75 2.5 0.0 M25-G10 51 114 1.75 6.00 1.00 8.25 8.75 0.0 0.0 M25-H11 52 115 6.75 0.00 0.00 1.25 4.75 2.5 0.0 M26-A12 53 116 9.75 9.00 10.00 12.25 12.75 19.1 14.3 M26-D6 54 117 9.75 0.00 0.00 6.25 5.75 20.0 17.8 M27-A1 55 118 6.75 9.00 4.00 12.25 12.75 20.0 21.3 M27-B7 56 119 9.75 0.00 1.00 6.25 6.75 5.6 0.0 M27-F8 57 120 0.00 0.00 0.00 0.00 0.75 0.8 0.0 M28-A4 58 121 0.00 0.00 0.00 0.00 3.75 0.0 0.0 M28-C9 59 122 0.00 0.00 0.00 0.00 6.75 0.0 0.0 M28-D6 60 123 3.75 0.00 4.00 6.25 9.75 2.5 0.0 M28-E4 61 124 3.75 3.00 4.00 9.25 12.75 0.0 0.0 M29-E7 62 125 6.75 3.00 7.00 9.25 7.75 4.4 7.0 M30-E11 63 126 0.75 0.00 0.00 0.25 8.25 13.0 0.0 Directly selected average 4.29 2.34 4.58 5.62 6.37 3.19 4.44 Re-assembled average 8.18 2.57 2.29 6.68 7.89 7.23 4.57

TABLE 2 (NA = nucleic acid) Table 2: Yeast randomized in-frame fusion polynucleotides + component open reading frames (ORFs) Protein 5′ Fusion NA sequence 5′ ORF polynucleotide sequence SEQ 5′ ORF polynucleotide 5′ ORF length name SEQ ID ID ID name description (bp) Y1-5A 1 64 YDR246W-A Putative 198 protein of unknown function Y1-7A 2 65 YHR126C ANS1 Putative 477 protein of unknown function Y1-9A 3 66 YOL026C MIM1 Mitochondrial 339 outer membrane protein Y1-13A 4 67 YDR488C PAC11 Dynein 1599 intermediate chain Y1-17A 5 68 YOR043W WHI2 Activator of 1458 the general stress response Y1-18A 6 69 YLR375W STP3 Zinc-finger 1029 protein of unknown function Y1-19A 7 70 YOR043W WHI2 Activator of 1458 the general stress response Y1-20A 8 71 YHL028W WSC4 ER 1815 membrane protein involved in translocation Y1-21A 9 72 YOL054W PSH1 E3 ubiquitin 789 ligase Y1-23A 10 73 YFL066C Y′ element 1176 helicase-like protein Y1-25A 11 74 YGR060W ERG25 C-4 methyl 927 sterol oxidase Y1-28A 12 75 YJL065C DLS1 ISW2 501 chromatin accessibility complex subunit Y1-33A 13 76 YLR094C GIS3 Protein of 1506 unknown function Y1-34B 14 77 YML064C TEM1 GTP- 726 binding protein of the ras superfamily Y1-38A 15 78 YML036W CGI121 EKC/KEOPS 649 protein complex component Y1-39B 16 79 YLR466C-B Dubious 114 open reading frame Y1-40A 17 80 YDL109C Putative 1941 lipase; involved in lipid metabolism Y1-43A 18 81 YLR154C-G Putative 147 protein of unknown function Y1-45A 19 82 YIR016W Putative 795 protein of unknown function Y1-47A 20 83 YER018C SPC25 Kinetochore- 663 assoc. Ndc80 complex component Y1-48A 21 84 YML116W ATR1 Multidrug 1626 efflux pump Y1-49A 22 85 YLR094C GIS3 Protein of 1506 unknown function Y1-58B 23 86 YDR378C LSM6 Lsm (Like 258 Sm) protein Y1-58C 24 87 YDR462W MRPL28 Mitochondrial 441 large subunit ribosomal protein Y1-66C 25 88 YGL235W Putative 534 protein of unknown function Y1-67B 26 89 YLL039C UBI4 Ubiquitin 231 essential for the cellular stress response Y2-28A 27 90 YLR154C-G Putative 147 protein of unknown function M21-A02 28 91 YOR043W WHI2 Activator of 1458 the general stress response M21-A03 29 92 YOR043W WHI2 Activator of 1458 the general stress response M21-A04 30 93 YGR209C TRX2 Cytoplasmic 312 thioredoxin isoenzyme M21-A09 31 94 YGR203W YCH1 Phosphatase 444 similar to Cdc25p M21-C08 32 95 YBR077C SLM4 Component 486 of the EGO complex M21-D06 33 96 YNL086W SNN1 Putative 306 protein of unknown function M22-C01 34 97 YPR080W TEF1 Translational 1374 elongation factor EF-1 alpha M22-C05 35 98 YKR095W-A PCC1 EKC/KEOPS 339 protein complex component M22-D01 36 99 YIR015W RPR2 Subunit of 432 nuclear RNase P M23-C03 37 100 YJL184W GON7 EKC/KEOPS 369 protein complex component M23-D02 38 101 YPL250C ICY2 Protein of 408 unknown function M23-D09 39 102 YMR226C NADP(+)- 801 dependent dehydrogenase M23-E02 40 103 YEL034W HYP2 Translation 471 elongation factor eIF- 5A M23-F02 41 104 YPL250C ICY2 Protein of 408 unknown function M23-H01 42 105 YLR154C-G Putative 147 protein of unknown function M24-A05 43 106 YNR049C MSO1 Secretory 630 vesicle docking complex component M24-B12 44 107 YMR156C TPP1 DNA 3′- 714 phosphatase M24-D11 45 108 YBR195C MSI1 Subunit of 1266 chromatin assembly factor I M24-E05 46 109 YGR203W YCH1 Phosphatase 444 similar to Cdc25p M24-F06 47 110 YHR055C CUP1-2 Metallothionein 183 binding copper and cadmium M25-E1 48 111 YJR120W Protein of 348 unknown function M25-F4 49 112 YHR055C CUP1-2 Metallothionein 183 M25-G8 50 113 YPR062W FCY1 Cytosine 474 deaminase M25-G10 51 114 YMR195W ICY1 Protein of 381 unknown function M25-H11 52 115 YLR162W Putative 354 protein of unknown function M26-A12 53 116 YMR195W ICY1 Protein of 381 unknown function M26-D6 54 117 YNL259C ATX1 Cytosolic 219 copper metallochaperone M27-A1 55 118 YDR432W NPL3 RNA- 1242 binding protein M27-B7 56 119 YOR043W WHI2 Activator of 1458 the general stress response M27-F8 57 120 YDR246W-A Putative 198 protein of unknown function M28-A4 58 121 YER018C SPC25 Kinetochore- 663 assoc. Ndc80 complex component M28-C9 59 122 YDR246W-A Putative 198 protein of unknown function M28-D6 60 123 YBR197C Putative 651 protein of unknown function M28-E4 61 124 YDR378C LSM6 Lsm (Like 258 Sm) protein M29-E7 62 125 YGR063C SPT4 Pol I and Pol 306 II transcriptional regulator M30-E11 63 126 YLR044C PDC1 Pyruvate 1689 decarboxylase 3′ ORF length + Fusion 3′ stop polynucleotide 3′ ORF polynucleotide codon name ID name 3′ ORF description (bp) Y1-5A YOR043W WHI2 Activator of the 1461 general stress response Y1-7A YOR043W WHI2 Activator of the 1461 general stress response Y1-9A YLR094C GIS3 Protein of 1509 unknown function Y1-13A YOR043W WHI2 Activator of the 1461 general stress response Y1-17A YJL185C Putative protein of 882 unknown function Y1-18A YOR085W OST3 Oligosaccharyltransferase 1053 gamma subunit Y1-19A YFL066C Y′ element 1179 helicase-like protein Y1-20A YOR043W WHI2 Activator of the 1461 general stress response Y1-21A YLR094C GIS3 Protein of 1509 unknown function Y1-23A YOR043W WHI2 Activator of the 1461 general stress response Y1-25A YOR043W WHI2 Activator of the 1461 general stress response Y1-28A YOR043W WHI2 Activator of the 1461 general stress response Y1-33A YKL117W SBA1 Hsp90 family co- 651 chaperone Y1-34B YOR043W WHI2 Activator of the 1461 general stress response Y1-38A YOR043W WHI2 Activator of the 1461 general stress response Y1-39B YOR043W WHI2 Activator of the 1461 general stress response Y1-40A YOR043W WHI2 Activator of the 1461 general stress response Y1-43A YOR043W WHI2 Activator of the 1461 general stress response Y1-45A YOR043W WHI2 Activator of the 1461 general stress response Y1-47A YOR043W WHI2 Activator of the 1461 general stress response Y1-48A YOR043W WHI2 Activator of the 1461 general stress response Y1-49A YHR219W Putative helicase 1946 Y1-58B YBL075C SSA3 ATPase involved 1950 in protein folding, stress response Y1-58C YGL236C MTO1 Mitochondrial 2010 protein Y1-66C YLR369W SSQ1 Mitochondrial 1974 hsp70-type molecular chaperone Y1-67B YBL081W Non-essential 1107 protein of unknown function Y2-28A YOL060C MAM3 Protein required 2121 for mitochondrial morphology M21-A02 YHR203C RPS4B 40S ribosomal 1055 subunit protein M21-A03 YLR094C GIS3 Protein of 1509 unknown function M21-A04 YOR043W WHI2 Activator of the 1461 general stress response M21-A09 YOR043W WHI2 Activator of the 1461 general stress response M21-C08 YOR043W WHI2 Activator of the 1461 general stress response M21-D06 YOR043W WHI2 Activator of the 1461 general stress response M22-C01 YOR043W WHI2 Activator of the 1461 general stress response M22-C05 YOR043W WHI2 Activator of the 1461 general stress response M22-D01 YOR043W WHI2 Activator of the 1461 general stress response M23-C03 YOR043W WHI2 Activator of the 1461 general stress response M23-D02 YJL205C NCE101 Protein of 305 unknown function M23-D09 YBR195C MSI1 Subunit of 1269 chromatin assembly factor I M23-E02 YOR043W WHI2 Activator of the 1461 general stress response M23-F02 YOR043W WHI2 Activator of the 1461 general stress response M23-H01 YGR063C SPT4 Pol I and Pol II 309 transcriptional regulator M24-A05 YOR043W WHI2 Activator of the 1461 general stress response M24-B12 YOR043W WHI2 Activator of the 1461 general stress response M24-D11 YOR101W RAS1 G-protein 930 signaling GTPase M24-E05 YLR094C GIS3 Protein of 1509 unknown function M24-F06 YOR043W WHI2 Activator of the 1461 general stress response M25-E1 YOR043W WHI2 Activator of the 1461 general stress response M25-F4 YOR043W WHI2 Activator of the 1461 general stress response M25-G8 YOR043W WHI2 Activator of the 1461 general stress response M25-G10 YBR195C MSI1 Subunit of 1269 chromatin assembly factor I M25-H11 YOR043W WHI2 Activator of the 1461 general stress response M26-A12 YOR043W WHI2 Activator of the 1461 general stress response M26-D6 YOR043W WHI2 Activator of the 1461 general stress response M27-A1 YOR043W WHI2 Activator of the 1461 general stress response M27-B7 YML036W CGI121 EKC/KEOPS 652 protein complex component M27-F8 YHR008C SOD2 Mitochondrial 702 manganese superoxide dismutase M28-A4 YNL042W-B Putative protein of 258 unknown function M28-C9 YPL157W TGS1 Trimethyl 948 guanosine synthase M28-D6 YLR094C GIS3 Protein of 1509 unknown function M28-E4 YLR094C GIS3 Protein of 1509 unknown function M29-E7 YOR043W WHI2 Activator of the 1461 general stress response M30-E11 YIL033C BCY1 cAMP-dependent 1251 protein kinase (PKA) regulatory subunit

Example 2

Isolation of Randomized In-Frame Fusion Polynucleotides Capable of Conferring Stress Tolerance to Escherichia coli Producing an E. coli Collection of Randomized In-Frame Fusion Polynucleotides

A collection or library of E. coli randomized in-frame fusion polynucleotides is prepared as described in U.S. patent application Ser. No. 14/134,619 and International Patent Application Serial Number PCT/US13/76526. The randomized in-frame fusion polypeptides are cloned into a vector molecule (SEQ ID NO:128). This vector is derived from the pUC19 high-copy plasmid and contains sequences for the ampicillin/carbenicillin resistance gene, pMB1 plasmid origin of replication, and the E. coli lac promoter and terminator. The vector is PCR amplified using oligonucleotides PG0185A (SEQ ID NO:129) and PG0188A (SEQ ID NO:130) for use in assembly of the randomized in-frame fusion polynucleotide collection.

Each 5′ ORF prepared for the randomized in-frame fusion polynucleotide collection is flanked by a conserved sequence (SEQ ID NO:131) at its 5′ end and by a conserved sequence (SEQ ID NO:132) at its 3′ end. For re-assembly of 5′ ORFs into new randomized in-frame fusion polynucleotide collections (described below), 5′ ORFs are PCR amplified using oligonucleotides PG0185 (SEQ ID NO:135) and PG0186 (SEQ ID NO:136).

Each 3′ ORF prepared for the randomized in-frame fusion polynucleotide collection is flanked by a conserved sequence (SEQ ID NO:133) at its 5′ end and by a conserved sequence (SEQ ID NO:134) at its 3′ end. For re-assembly of 3′ ORFs into new randomized in-frame fusion polynucleotide collections (described below), 3′ ORFs are PCR amplified using oligonucleotides PG0187 (SEQ ID NO:137) and PG0188 (SEQ ID NO:138).

Sequence Amplification General Method

All PCR amplifications are performed using the following method.

The two amplification primers, each at a final concentration of 1.2 μM, are combined with 10 ng of template DNA, PCR buffer and thermostable polymerase in a total reaction volume of 50 μl. A high-fidelity thermostable polymerase such as Phusion™ Hot Start II thermostable high-fidelity polymerase (Thermo Scientific) can be used. For Phusion™ polymerase, the 5× HF amplification buffer supplied with the enzyme is used for all amplifications. All amplifications are performed on T100 thermal cyclers (Bio-Rad Laboratories) containing 96-well blocks. The deoxynucleotide triphosphates (dNTPs) used in all amplifications are a stock containing 10 mM of each dNTP, also obtained from Thermo Scientific. Deionized water is used in all reactions and to make all solutions not supplied with the polymerase. PCR amplicons are generated by dentaturing at 95° C. for 2 minutes followed by 10-35 cycles of: 20 seconds at 95° C., 20 seconds at 60° C. and 1 min/kb at 72° C. (but a minimum of 30 seconds at 72° C.). The efficiency of formation of the PCR product is measured by agarose electrophoresis or by fluorescent spectroscopy using a fluorometer such as a Qubit® fluorometer (Life Technologies). Successful PCR reactions can be purified using silica resins suitable for DNA purification. Unsuccessful reactions are repeated by varying the Mg⁺² concentrations in the PCR reaction and/or other reaction conditions. Following successful amplification of each ORF, the concentration of each PCR product is normalized, and products corresponding to specific size ranges are pooled.

All PCR amplifications follow the same general procedure:

1. A PCR mix as described below is prepared for each stage of the PCR reaction, and is kept cold until inserted into the thermal cycler.

2. The samples are mixed thoroughly and then centrifuged at 4000 rpm for 1 minute to bring the reaction contents to the bottom of the tube or well in a plate.

3. The plates or tubes are inserted into a thermal cycler.

Selective Conditions Used for Selection of Heat and Salt Tolerance

Populations of E. coli cells transformed with the randomized in-frame fusion polynucleotide collection are selected for randomized in-frame fusion polynucleotide constructs conferring enhanced cell viability at high temperature (47-50° C. for 48-72 hours), or in the presence of high concentrations of salt (1.5-2.5 M NaCl for 72 h to 7 days). All selections are performed in LB liquid medium (per liter, 10 g tryptone, 5 g yeast extract, 10 g NaCl) containing 50 μg/ml carbenicillin and IPTG or lactose to induce expression of the randomized in-frame fusion polynucleotides from the lac promoter; all chemicals are purchased from Thermo Scientific. The cells are allowed to grow for 30 minutes at 37° C. on a shaker before inoculating the selections at a cell density of approximately 10⁷ cells/ml. Initial selections, performed with high-complexity collections of randomized in-frame fusion polynucleotides, are performed in 50-100 ml of liquid medium in 500 ml Erlenmeyer shake flasks tightly closed with a screw top to prevent medium evaporation during selection. Subsequent selections performed with smaller collections of randomized in-frame fusion polynucleotides can be performed in lower volumes at the same overall cell density. The volume of a selective culture can be chosen such that the total number of cells is a multiple of 20 or higher of the total expected number of randomized in-frame fusion polynucleotides being tested.

After selection, the cells are collected by centrifugation, are plated on LB solid medium (per liter, 10 g tryptone, 5 g yeast extract, 10 g NaCl, 15 g Bacto agar) containing 50 μg/ml carbenicillin and are allowed to grow overnight at 37° C. Colonies or lawns of cells arising after overnight growth are removed from the plates by scraping with glass beads. This is done by adding 5 ml LB broth+50 μg/ml carbenicillin to each 10 cm plate together with 10× 4 mm glass beads. Proportionally higher volumes of medium are added to larger plates. Using swirling and shaking motions to allow the glass beads to dislodge the bacterial cells from the surface of the agar, the resuspended cells are collected with a pipet, using additional medium to wash any remaining cells off the plate, if desired. Cells collected in this fashion are pelleted by centrifugation at 4000 rpm for 15 minutes and plasmid DNA isolated using a silica resin column such as the Macherey Nagel NucleoSpin® Plasmid kit following the manufacturer's instructions.

The recovered plasmid DNA containing randomized in-frame fusion polynucleotides is transformed into competent E. coli cells such as DH10B (Life Technologies) or EC100 (Epicentre Technologies) strain of E. coli by electroporation. Alternative strains can be used if so desired for the subsequent round of phenotypic selections. 1 μl DNA is combined with 25 μl electrocompetent cells on ice, transferred into a 1 mm gap size electroporation cuvette, and electroporated at 1.5 kV using a Bio-Rad MicroPulser™ electroporator. Cells are suspended in 0.5 ml LB broth, allowed to recover for 1 hour at 37° C. on a shaker and plated in 0.25 ml aliquots s onto 10 cm plates containing LB agar medium with 50 μg/ml carbenicillin. Colonies are grown overnight at 37° C. Colonies or lawns of cells arising after overnight growth are removed from the plates by scraping with glass beads, resuspended in LB medium containing 50 μg/ml carbenicillin and IPTG or lactose to induce expression of the fusion polynucleotides from the lac promoter, and are subjected to another round of selection as described above, if desired.

Iterative Selection of Randomized In-Frame Fusion Polynucleotides Conferring Heat and Salt Tolerance and Resistance

For iterative selection of randomized in-frame fusion polynucleotides conferring tolerance and resistance to heat and salt, populations of E. coli cells transformed with randomized in-frame fusion polynucleotide collection DNA are subjected to repetitive selections such as the ones described above. This procedure allows for gradual enrichment of randomized in-frame polynucleotides conferring tolerance and/or resistance to heat and salt, and for isolation of in-frame fusion polynucleotides containing the best combinations of ORFs for conferring tolerance to lethal temperatures and salt concentrations.

Alternatively, after a round of selection, the 5′ ORFs and 3′ ORFs contained in the randomized in-frame fusion polynucleotides recovered from survivors of the selection are re-isolated by PCR amplification and then recombined with each other to form a new re-assembled randomized in-frame fusion polynucleotide collection. This process may allow new sequence combinations to arise that encode in-frame fusion polynucleotides capable of conferring traits of interest. Selections performed on cell populations containing collections of re-assembled in-frame fusion polynucleotides may contain random in-frame fusion polynucleotides with different sequence combinations, or random in-frame fusion polynucleotides with sequence combinations at different frequencies or concentrations, compared to the initial randomized in-frame fusion polynucleotide collection, or compared to smaller populations of randomized in-frame fusion polynucleotides selected directly from the initial collection as described above. The sequence combinations formed by the reassembly process may confer better protection against heat and salt, resulting in more desirable phenotypic values of transformants containing such sequence combinations.

The randomized in-frame fusion polynucleotide plasmid DNA isolated from cells/colonies surviving heat or salt selections is used to re-amplify the 5′ ORFs and 3′ ORFs present in the randomized in-frame fusion polynucleotides. The amplification is performed as described above, using oligonucleotides PG0185 (SEQ ID NO:135) and PG0186 (SEQ ID NO:136) for amplifying the 5′ ORFs, and using oligonucleotides PG0187 (SEQ ID NO:137) and PG0188 (SEQ ID NO:138) for amplifying the 3′ ORFs. Optionally, mutations can be introduced into the 5′ ORFs and 3′ ORFs in the course of PCR amplification, using either lower-fidelity thermostable polymerases (Cline 1996, Biles 2004), or PCR-based incorporation of mutagenic nucleotide analogs such as 8-oxo-dGTP, dPTP, 5-bromo-dUTP, 2-hydroxy-dATP and dITP (Spee 1993, Kuipers 1996, Zaccolo 1996, Zaccolo 1999, Kamiya 2004, Kamiya 2007, Ma 2008, Petrie 2010, Wang 2012a). The re-amplified 5′ ORFs and 3′ ORFs are electrophoresed on a 1.5% agarose gel to remove amplification products below 200 bp in size and are purified from the gel using a commercial kit. The pUC19 vector DNA is also re-amplified and purified after electrophoresis.

Re-Assembly of Re-Amplified ORFs into a Randomized In-Frame Fusion Polynucleotide Collection

The purified, re-amplified 5′ ORFs and 3′ ORFs are re-assembled with the pUC19 vector DNA as described below and are introduced into E. coli as a new collection of randomized in-frame fusion polynucleotides using the assembly methods described below. The resulting plasmid DNA is purified using a commercially available plasmid purification kit, following the manufacturer's recommendations.

The re-assembly is done using a ligation-independent cloning method. The following single-tube procedure uses a single-stranded exonuclease to create single-stranded tails at the ends of the DNA molecules to be assembled, followed by annealing of the homologous ends and fill-in of the remaining single-stranded regions. The purified DNA fragments resulting from PCR amplification of the expression vector backbone and the 5′ and 3′ ORFs (see above) are combined in roughly equimolar amounts for a total of approximately 100 ng DNA in a 10 μl reaction. A 10× assembly buffer (500 mM Tris-HCl pH 8.0, 100 mM MgCl₂, 100 mM β-mercaptoethanol, 1 mM each of the 4 dNTPs) is added to produce a 1× concentration. Also added to the reaction are 0.01 unit of a single-stranded exonuclease and 1 unit of a thermostable, high-fidelity hot-start polymerase such as Phusion™ polymerase. Hot start implies that at physiological temperatures the polymerase is in an inactive for, for example being bound by an antibody or other compound, preventing it from competing with the exonuclease for binding to DNA ends in the early stages of the reaction. The reaction volume is adjusted to 10 μl, the reaction is mixed gently and incubated at 37° C. for 5 minutes allowing the exonuclease to act on the DNA ends. The temperature is then raised to 50-60° C. to inactivate the exonuclease and activate the polymerase while promoting annealing of single-stranded ends of the DNA molecules. The mixture is incubated at 50-60° C. for 30 minutes and the temperature is then reduced to 4° C. to stop the reaction. The reaction can be performed in a PCR machine for efficient temperature changes. After completion, the reaction mixture can be stored at −20° C. and is ready to be transformed into competent E. coli as described above.

Exonucleases that are suitable for this procedure are T4 DNA polymerase, Exonuclease III, lambda exonuclease, T5 exonuclease or T7 exonuclease. Exonucleases with 5′ to 3′ activity directionality (i.e. T4 polymerase, lambda exonuclease, T5 exonuclease or T7 exonuclease) are preferred as they result in higher numbers of base pairs of annealed sequence between the two nicks at each cloning junction, thus stabilizing the desired product. The reaction may also be supplemented with polyethylene glycol (molecular weight 4000-10000) at a final concentration of 5-10% to promote annealing of single-stranded DNA ends, if desired.

After production of new re-assembled randomized in-frame polynucleotide collections, populations of cells are transformed with these collections using methods similar to those described above. The populations of transformed cells are then again exposed to selective conditions to select cells containing plasmids and polynucleotides conferring heat and salt tolerance.

The phenotypic values conferred by individual in-frame fusion polynucleotides isolated from randomized in-frame fusion polynucleotide collections using any of the selection methods described above, can be measured and compared between different transformants to find randomized in-frame fusion polynucleotides conferring the highest level of protection.

Testing of Individual, Randomized In-Frame Fusion Polynucleotides for Conferral of Heat and Salt Tolerance in Cell Survival Assays

Plasmid DNA isolated from colonies or lawns of cells grown from survivors of killing heat or salt selections, is transformed into competent E. coli cells and plated at low cell densities onto LB agar plates containing 50 μg/ml carbenicillin. Individual colonies are placed into 96-well deep-well plates and grown over night at 37° C., each well containing 1 ml LB containing 50 μg/ml carbenicillin. Certain wells in the plate are reserved for cells transformed with control plasmids that either lack an insert or contain inserts known not to confer heat or salt tolerance.

For heat tolerance cell survival assays, after overnight growth the cell densities of 10-15 cultures in different wells are measured by optical density measurements on a spectrophotometer at 600 nm, such as the NanoDrop™ Spectrophotometer (Thermo Scientific). The optical densities are averaged and a dilution factor is calculated for preparing test cultures at an OD600 of 0.01, which corresponds roughly to a cell density of 10⁷ cells/ml. The 96-well culture is then diluted by the calculated factor into a fresh plate with each well containing the appropriate amount of LB medium with 50 μg/ml carbenicillin and either IPTG or galactose to induce expression of the in-frame fusion polynucleotides. This selection plate is incubated at 48° C. for 48-72 hours while shaking at 250 rpm. After selection, the cells in each well are thoroughly resuspended by pipetting, diluted 1:10 in LB medium in a separate plate, and 3 μl aliquots of the undiluted and diluted selections spotted in arrays of 96 spots, representing 48 randomized in-frame fusion polynucleotides, onto LB agar containing 50 μg/ml carbenicillin using a Bel-Art Products Bel-Art 96-well replicating tool. The spots are allowed to dry and the plates are incubated at 37° C. overnight.

For salt tolerance cell survival assays, after overnight growth the cell densities of 10-15 cultures in different wells are measured by optical density measurements on a spectrophotometer at 600 nm, such as the NanoDrop™ Spectrophotometer (Thermo Scientific). The optical densities are averaged and a dilution factor is calculated for preparing test cultures at an OD600 of 0.01, which corresponds roughly to a cell density of 10⁷ cells/ml. The 96-well culture is then diluted by the calculated factor into a fresh plate with each well containing the appropriate amount of LB medium with 50 μg/ml carbenicillin, 2.5 M NaCl and either IPTG or galactose to induce expression of the in-frame fusion polynucleotides. The plate is incubated at 37° C. for 48-72 hours while shaking at 250 rpm. After selection, the cells in each well are thoroughly resuspended by pipetting, diluted 1:10 in LB medium in a separate plate, and 3 μl aliquots of the undiluted and diluted selections spotted in arrays of 96 spots, representing 48 randomized in-frame fusion polynucleotides, onto LB agar with 50 μg/ml carbenicillin using a Bel-Art Products Bel-Art 96-well replicating tool. The spots are allowed to dry and the plates are incubated at 37° C. overnight.

The intensity of the cell spots on the test plates resulting from the heat and salt tolerance cell survival assays after overnight growth are indicative of the extent of cell survival under selection. Spot intensities are scored based on a scale of 0-3, 0 being no growth, 1 slight growth, 2 significant growth and 3 confluent growth. Both spots, resulting from the two dilutions, are taken into account to generate the score. This method allows identification of the best randomized in-frame fusion polynucleotides conferring heat and salt tolerance.

A panel of two plate images, with results of the heat and salt tolerance cell survival assay for 48 in-frame fusion polynucleotides, are shown in FIG. 6. The heat tolerance assays represented in FIG. 6 were performed with a stringent heat selection of 72 hours at 48° C., which resulted in survival of only the bacterial cultures harboring in-frame fusion polynucleotide clones capable of conferring maximal heat tolerance.

Table 3 shows averaged scores from a set of 24 randomized in-frame fusion polynucleotides selected from a randomized in-frame fusion polynucleotide collection by 2 rounds of iterative selection, compared to 160 randomized in-frame fusion polynucleotides selected from a randomized in-frame fusion polynucleotide collection that was re-assembled from plasmids isolated from survivors of a single round of heat or salt selections, and subjected to one more round of heat or salt selection after re-assembly. The 24 randomized in-frame fusion polynucleotides represented in the plate that were isolated by direct selection, were selected from a larger earlier set of 192 clones that were tested earlier for heat and salt tolerance by cell survival assays as described above. The 24 polynucleotides selected from this set of 192 clones represented the ones with the best heat and salt tolerance phenotypes, and were chosen for comparison to the 160 polynucleotides resulting from the reassembly process. The data of this comparison is shown in FIG. 6 and Table 3.

The scores in Table 3 can be considered phenotypic values conferred by each randomized in-frame fusion polynucleotide for each selection imposed on the transformants. High scores represent high phenotypic values conferred by the corresponding randomized in-frame fusion polynucleotide.

Two types of in-frame fusion polynucleotides are represented in Table 3. Twenty-four randomly selected randomized in-frame fusion polynucleotides (M43-A04, M43-C04, M43-D09, M44-D01, M44-D09, M44-F05, M44-F07, M44-F08, M44-F10, M44-F11, M44-F12, M44-G01, M44-G02, M44-G04, M44-G05, M44-G06, M44-G07, M44-G08, M44-G09, M44-G10, M44-G11, M44-H01, M44-H03 and M44-H04) were selected by two iterative rounds of selection. These 24 randomized in-frame fusion polynucleotides are referred to as type “Direct selection” in Table 3. The remaining 160 randomized in-frame fusion polynucleotides were selected by 1 round of direct selection, followed by 1 round of PCR amplification and re-assembly, followed by another round of direct selection and picking of random colonies formed from cells surviving the last round of selection. These 160 randomized in-frame fusion polynucleotides are referred to as type “Re-assembly” in Table 3.

Average activities are computed separately for the two classes of randomized in-frame fusion polynucleotides, for both heat and salt selections shown in Table 3. The average scores are shown at the bottom of the Table. For both heat and salt selection, the average scores for the re-assembled randomized in-frame fusion polynucleotides are higher than those of the directly selected randomized in-frame fusion polynucleotides. This data indicates that a re-assembly step can be advantageous for isolating in-frame fusion polynucleotides conferring high phenotypic values.

Characterization of Positive Randomized In-Frame Fusion Polynucleotides and Additional Screens

Randomized in-frame fusion polynucleotide expression constructs conferring the most dramatic or broad phenotypes are sequenced to identify the randomized in-frame fusion polynucleotides.

TABLE 3 Table 3: Resistance and tolerance activities of 184 fusion polynucleotides in E. coli Salt Heat Salt Heat Fusion tolerance tolerance Fusion tolerance tolerance gene name Type score score gene name Type score score M43-A04 Direct selection 0.00 0.00 M47-G12 Re-assembly 0.00 0.00 M43-C04 Direct selection 0.00 2.00 M47-H01 Re-assembly 1.90 0.00 M43-D09 Direct selection 0.00 0.00 M47-H02 Re-assembly 1.90 0.00 M44-D01 Direct selection 0.00 0.00 M47-H03 Re-assembly 1.90 0.00 M44-D09 Direct selection 0.90 0.00 M47-H04 Re-assembly 0.00 0.00 M44-F05 Direct selection 0.00 0.00 M47-H05 Re-assembly 1.90 0.00 M44-F07 Direct selection 0.00 0.00 M47-H06 Re-assembly 0.00 0.00 M44-F08 Direct selection 0.00 1.00 M47-H07 Re-assembly 0.00 0.00 M44-F10 Direct selection 0.00 0.00 M47-H09 Re-assembly 0.00 0.00 M44-F11 Direct selection 0.00 0.00 M47-H10 Re-assembly 1.90 0.00 M44-F12 Direct selection 0.00 0.00 M47-H11 Re-assembly 1.90 0.00 M44-G01 Direct selection 0.00 0.00 M47-H12 Re-assembly 0.90 0.00 M44-G02 Direct selection 0.00 0.00 M48-A02 Re-assembly 0.00 0.00 M44-G04 Direct selection 0.00 1.00 M48-A03 Re-assembly 1.90 0.00 M44-G05 Direct selection 0.00 0.00 M48-A05 Re-assembly 0.00 0.00 M44-G06 Direct selection 0.00 0.00 M48-A06 Re-assembly 0.00 0.00 M44-G07 Direct selection 0.00 0.00 M48-A07 Re-assembly 0.00 0.00 M44-G08 Direct selection 0.00 0.00 M48-A08 Re-assembly 0.00 0.00 M44-G09 Direct selection 0.00 0.00 M48-A09 Re-assembly 1.90 0.00 M44-G10 Direct selection 0.00 0.00 M48-A10 Re-assembly 1.90 0.00 M44-G11 Direct selection 0.00 0.00 M48-A11 Re-assembly 1.90 0.00 M44-H01 Direct selection 0.00 0.00 M48-B01 Re-assembly 1.90 0.00 M44-H03 Direct selection 0.00 0.00 M48-B03 Re-assembly 0.90 0.00 M44-H04 Direct selection 0.00 0.00 M48-B04 Re-assembly 0.00 0.00 M47-A02 Re-assembly 1.90 0.00 M48-B05 Re-assembly 0.00 0.00 M47-A03 Re-assembly 1.90 0.00 M48-B06 Re-assembly 1.90 0.00 M47-A05 Re-assembly 1.90 0.00 M48-B07 Re-assembly 0.00 0.00 M47-A06 Re-assembly 1.90 0.00 M48-B08 Re-assembly 1.90 0.00 M47-A07 Re-assembly 0.00 0.00 M48-B09 Re-assembly 1.90 0.00 M47-A08 Re-assembly 1.90 0.00 M48-B10 Re-assembly 0.00 0.00 M47-A09 Re-assembly 0.00 0.00 M48-B11 Re-assembly 0.00 0.00 M47-A10 Re-assembly 1.90 0.00 M48-B12 Re-assembly 0.00 0.00 M47-A11 Re-assembly 0.90 0.00 M48-C01 Re-assembly 0.90 0.00 M47-B01 Re-assembly 1.90 0.00 M48-C02 Re-assembly 1.90 0.00 M47-B03 Re-assembly 0.90 0.00 M48-C04 Re-assembly 0.00 0.00 M47-B04 Re-assembly 0.90 0.00 M48-C05 Re-assembly 1.90 0.00 M47-B05 Re-assembly 0.00 0.00 M48-C06 Re-assembly 0.90 0.00 M47-B06 Re-assembly 0.90 0.00 M48-C07 Re-assembly 0.00 0.00 M47-B07 Re-assembly 0.00 0.00 M48-C08 Re-assembly 0.00 0.00 M47-B08 Re-assembly 0.90 0.00 M48-C09 Re-assembly 0.00 0.00 M47-B09 Re-assembly 1.90 0.00 M48-C10 Re-assembly 0.00 2.00 M47-B10 Re-assembly 1.90 0.00 M48-C12 Re-assembly 0.00 0.00 M47-B11 Re-assembly 0.00 0.00 M48-D01 Re-assembly 0.00 3.00 M47-B12 Re-assembly 1.90 0.00 M48-D02 Re-assembly 1.90 0.00 M47-C01 Re-assembly 1.90 0.00 M48-D03 Re-assembly 0.00 0.00 M47-C02 Re-assembly 0.90 0.00 M48-D05 Re-assembly 0.00 3.00 M47-C04 Re-assembly 0.00 0.00 M48-D06 Re-assembly 0.00 0.00 M47-C05 Re-assembly 0.90 0.00 M48-D07 Re-assembly 0.90 1.00 M47-C06 Re-assembly 1.90 0.00 M48-D08 Re-assembly 0.00 3.00 M47-C07 Re-assembly 1.90 0.00 M48-D10 Re-assembly 0.00 2.00 M47-C08 Re-assembly 0.00 0.00 M48-D11 Re-assembly 0.00 2.00 M47-C09 Re-assembly 1.90 0.00 M48-D12 Re-assembly 0.00 3.00 M47-C10 Re-assembly 1.90 0.00 M48-E01 Re-assembly 0.00 0.00 M47-C12 Re-assembly 0.90 0.00 M48-E02 Re-assembly 0.00 3.00 M47-D01 Re-assembly 0.90 0.00 M48-E03 Re-assembly 0.00 0.00 M47-D02 Re-assembly 0.00 0.00 M48-E04 Re-assembly 0.00 0.00 M47-D03 Re-assembly 0.00 0.00 M48-E06 Re-assembly 1.90 0.00 M47-D05 Re-assembly 0.00 0.00 M48-E07 Re-assembly 0.00 0.00 M47-D06 Re-assembly 0.00 0.00 M48-E08 Re-assembly 0.00 0.00 M47-D07 Re-assembly 0.00 0.00 M48-E09 Re-assembly 0.00 0.00 M47-D08 Re-assembly 0.00 0.00 M48-E11 Re-assembly 1.90 0.00 M47-D11 Re-assembly 1.90 0.00 M48-E12 Re-assembly 0.00 0.00 M47-D12 Re-assembly 0.00 0.00 M48-F01 Re-assembly 0.00 0.00 M47-E01 Re-assembly 0.00 0.00 M48-F03 Re-assembly 0.90 2.00 M47-E02 Re-assembly 1.90 0.00 M48-F04 Re-assembly 0.00 0.00 M47-E03 Re-assembly 0.00 0.00 M48-F05 Re-assembly 0.00 2.00 M47-E04 Re-assembly 0.00 0.00 M48-F07 Re-assembly 0.00 3.00 M47-E06 Re-assembly 1.90 0.00 M48-F08 Re-assembly 0.00 1.00 M47-E07 Re-assembly 1.90 0.00 M48-F09 Re-assembly 0.00 3.00 M47-E08 Re-assembly 1.90 0.00 M48-F10 Re-assembly 0.90 0.00 M47-E09 Re-assembly 0.00 0.00 M48-F11 Re-assembly 0.90 0.00 M47-E10 Re-assembly 0.90 0.00 M48-F12 Re-assembly 0.00 0.00 M47-E11 Re-assembly 1.90 0.00 M48-G01 Re-assembly 0.90 0.00 M47-E12 Re-assembly 0.90 0.00 M48-G02 Re-assembly 0.00 0.00 M47-F01 Re-assembly 0.00 0.00 M48-G03 Re-assembly 0.00 0.00 M47-F03 Re-assembly 1.90 0.00 M48-G05 Re-assembly 0.00 0.00 M47-F04 Re-assembly 1.90 0.00 M48-G06 Re-assembly 0.00 0.00 M47-F05 Re-assembly 1.90 0.00 M48-G08 Re-assembly 0.00 0.00 M47-F07 Re-assembly 0.00 0.00 M48-G10 Re-assembly 0.00 0.00 M47-F08 Re-assembly 0.00 0.00 M48-G11 Re-assembly 1.90 0.00 M47-F09 Re-assembly 1.90 0.00 M48-G12 Re-assembly 0.00 0.00 M47-F10 Re-assembly 0.90 0.00 M48-H01 Re-assembly 0.00 0.00 M47-F11 Re-assembly 0.90 0.00 M48-H02 Re-assembly 0.00 0.00 M47-F12 Re-assembly 0.00 1.00 M48-H03 Re-assembly 0.00 0.00 M47-G01 Re-assembly 1.90 0.00 M48-H04 Re-assembly 1.90 0.00 M47-G02 Re-assembly 1.90 0.00 M48-H05 Re-assembly 0.00 0.00 M47-G03 Re-assembly 0.90 0.00 M48-H06 Re-assembly 0.00 0.00 M47-G05 Re-assembly 0.90 0.00 M48-H07 Re-assembly 0.00 0.00 M47-G06 Re-assembly 0.00 0.00 M48-H09 Re-assembly 0.90 0.00 M47-G08 Re-assembly 1.90 1.00 M48-H10 Re-assembly 0.00 0.00 M47-G10 Re-assembly 1.90 0.00 M48-H11 Re-assembly 1.90 0.00 M47-G11 Re-assembly 1.90 0.00 M48-H12 Re-assembly 0.00 0.00 Average Direct selection 0.013 0.167 Average Re-assembly 0.728 0.219

REFERENCES

Biles B D, Connolly B A (2004). Low-fidelity Pyrococcus furiosus DNA polymerase mutants useful in error-prone PCR. Nucleic Acids Res. 32(22):e176.

Brachmann C B, Davies A, Cost G J, Caputo E, Li J, Hieter P, Boeke J D (1998). Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14(2):115-132.

Chenna R, Sugawara H, Koike T, Lopez R, Gibson T J, Higgins D G, Thompson J D (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31(13):3497-3500.

Cline J, Braman J C, Hogrefe H H (1996). PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 24(18):3546-3551.

da Costa L J, Tanuri A (1998). Use of T7 gene 6 exonuclease and phosphorothioated primers for the manipulation of HIV-1 infectious clones. J Virol Methods 72(1):117-121.

Ding J, Huang X, Zhang L, Zhao N, Yang D, Zhang K (2009). Tolerance and stress response to ethanol in the yeast Saccharomyces cerevisiae. Appl Microbiol Biotechnol. 85(2):253-263.

Dismukes G C, Carrieri D, Bennette N, Ananyev G M, Posewitz M C (2008). Aquatic phototrophs: efficient alternatives to land-based crops for biofuels. Curr Opin Biotechnol. 19(3):235-240.

Dolganov N, Grossman A R (1993). Insertional inactivation of genes to isolate mutants of Synechococcus sp. strain PCC 7942: isolation of filamentous strains. J Bacteriol. 175(23):7644-7651.

Dunlop M J (2011). Engineering microbes for tolerance to next-generation biofuels. Biotechnol Biofuels 4:32.

Funk M, Niedenthal R, Mumberg D, Brinkmann K, Ronicke V, Henkel T (2002). Vector systems for heterologous expression of proteins in Saccharomyces cerevisiae. Methods Enzymol. 350:248-57.

Gibson D G, Young L, Chuang R Y, Venter J C, Hutchison C A 3rd, Smith H O (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 6(5):343-345.

Gibson D G, Smith H O, Hutchison C A 3rd, Venter J C, Merryman C. (2010). Chemical synthesis of the mouse mitochondrial genome. Nat Methods. 7(11):901-903.

Gietz R D, Woods R A (2002). Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol. 350:87-96.

Gietz R D, Woods R A (2006). Yeast transformation by the LiAc/SS Carrier DNA/PEG method. Methods Mol Biol. 313:107-120.

Gietz R D, Schiestl R H (2007). High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protocols 2(1):31-34.

Irwin C R, Farmer A, Willer D O, Evans D H (2012). In-fusion® cloning with vaccinia virus DNA polymerase. Methods Mol Biol. 890:23-35.

Jang Y S, Kim B, Shin J H, Choi Y J, Choi S, Song C W, Lee J, Park H G, Lee S Y (2012). Bio-based production of C2-C6 platform chemicals. Biotechnol Bioeng. 109(10):2437-2459.

Jia, Kaizhi; Zhang, Yanping; Li, Yin (2009). Systematic engineering of microorganisms to improve alcohol tolerance. Engineering in Life Sciences 10(5): 422-429.

Kamiya H, Ito M, Harashima H (2004). Induction of transition and transversion mutations during random mutagenesis PCR by the addition of 2-hydroxy-dATP. Biol Pharm Bull. 27(5):621-623.

Kamiya H, Ito M, Harashima H (2007). Induction of various mutations during PCRs with manganese and 8-hydroxy-dGTP. Biol Pharm Bull. 30(4):842-844.

Kuipers O P (1996). Random mutagenesis by using mixtures of dNTP and dITP in PCR. Methods Mol Biol. 57:351-356.

Lathe R, Kieny M P, Skory S, Lecocq J P (1984). Linker tailing: unphosphorylated linker oligonucleotides for joining DNA termini. DNA 3(2): 173-182.

Lee J W, Na D, Park J M, Lee J, Choi S, Lee S Y (2012). Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat Chem Biol. 8(6):536-546.

Li M Z, Elledge S J. (2007). Harnessing homologous recombination in vitro to generate recombinant DNA via SLIC. Nat Methods. 4(3): 251-256.

Li C, Wen A, Shen B, Lu J, Huang Y, Chang Y. (2011). FastCloning: a highly simplified, purification-free, sequence- and ligation-independent PCR cloning method. BMC Biotechnol. 11:92.

Li M Z, Elledge S J. (2012). SLIC: a method for sequence- and ligation-independent cloning. Methods Mol Biol. 852:51-59.

Liu X P, Liu J H (2010). The terminal 5′ phosphate and proximate phosphorothioate promote ligation-independent cloning. Protein Sci. 19(5):967-973.

Lobban P E, Kaiser A D (1973). Enzymatic end-to end joining of DNA molecules. J Mol Biol. 78(3): 453-471.

Ma X, Ke T, Mao P, Jin X, Ma L, He G (2008). The mutagenic properties of BrdUTP in a random mutagenesis process. Mol Biol Rep. 35(4):663-667.

Mascal M (2012). Chemicals from biobutanol: technologies and markets. Biofuels, Bioprod. Bioref. 6(4):483-493.

Petrie K L, Joyce G F (2010). Deep sequencing analysis of mutations resulting from the incorporation of dNTP analogs. Nucleic Acids Res. 38(22):8095-8104.

Quan J, Tian J (2009). Circular polymerase extension cloning of complex gene libraries and pathways. PLoS One. 4(7): e6441.

Quan J, Tian J (2011). Circular polymerase extension cloning for high-throughput cloning of complex and combinatorial DNA libraries. Nat Protoc. 6(2):242-251.

Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, Second Ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.

Sikorski R S, Hieter P (1989). A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122(1):19-27.

Spee J H, de Vos W M, Kuipers O P (1993). Efficient random mutagenesis method with adjustable mutation frequency by use of PCR and dITP. Nucleic Acids Res. 21(3):777-778.

Thieme F, Engler C, Kandzia R, Marillonnet S (2011). Quick and clean cloning: a ligation-independent cloning strategy for selective cloning of specific PCR products from non-specific mixes. PLoS One 6(6): e20556.

Vroom J A, Wang C L (2008). Modular construction of plasmids through ligation-free assembly of vector components with oligonucleotide linkers. Biotechniques 44(7): 924-926.

Wang Z, Wang H Y, Feng H (2012a). A simple and reproducible method for directed evolution: combination of random mutation with dITP and DNA fragmentation with endonuclease V. Mol Biotechnol. 53(1):49-54.

Ward A C (1990). Single-step purification of shuttle vectors from yeast for high frequency back-transformation into E. coli. Nucleic Acids Res. 8(17):5319.

Zaccolo M, Williams D M, Brown D M, Gherardi E (1996). An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues. J Mol Biol. 255(4):589-603.

Zaccolo M, Gherardi E (1999). The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. J Mol Biol. 285(2):775-783.

Zhu B, Cai G, Hall E O, Freeman G J (2007). In-fusion assembly: seamless engineering of multidomain fusion proteins, modular vectors, and mutations. BioTechniques 43:354-359.

All publications, databases, GenBank sequences, patents and patent applications cited in this Specification are herein incorporated by reference as if each was specifically and individually indicated to be incorporated by reference. 

1. A method of producing an organism with a new or altered phenotype, comprising: (a) introducing a composition comprising a plurality of random in-frame fusion polynucleotides having sequences different from each other into an organism to produce one or more transformed organisms, each random in-frame fusion polynucleotide comprising at least two full-length open reading frames, one designated as a 5′ full-length open reading frame and one designated as a 3′ full-length open reading frame, the 5′ full-length open reading frame being joined, either directly, or indirectly via at least one intervening open reading frame, to the 3′ full-length open reading frame to form a composite open reading frame encoding a fusion polypeptide; (b) isolating at least two transformed organism exhibiting a different phenotype as compared to a control organism cultivated under the same conditions; (c) isolating at least the 5′ full-length open reading frame and the 3′ open-reading frame from at least two fusion polynucleotides present in the organisms of step (b); (d) randomly combining the isolated full-length open reading frames of step (c) to form at least two random in-frame polynucleotides, each comprising a composite open reading frame encoding a fusion polypeptide; (e) introducing the polynucleotide of step (d) into an organism to produce one or more transformed or and (f) isolating at least one transformed organism from the organism(s) in step (e) exhibiting a different phenotype as compared to a control organism cultivated under the same conditions, wherein steps (c) through (f) are optionally repeated one or more times.
 2. The method of claim 1 wherein, each full-length 5′ open reading frame is selected from a collection comprising a plurality of full-length open reading frames having sequences different from each other and each full-length 3′ open reading frame is selected from a second collection comprising a plurality of full-length open reading frames having sequences different from each other.
 3. The method according to claim 1, wherein the at least two full-length open reading frames are nonhomologous. 4.-10. (canceled)
 11. A method of producing an organism with a new or altered phenotype, comprising: (a) introducing a composition comprising a plurality of random in-frame fusion polynucleotides having sequences different from each other into an organism to produce one or more transformed organisms, each random in-frame fusion polynucleotide comprising at least two open reading frames isolated from the genome of the same species of organism, one designated as a 5′ open reading frame and one designated as a 3′ open reading frame, the 5′ open reading frame being joined either directly, or indirectly via at least one intervening open reading frame, to the 3′ open reading frame to form a composite open reading frame encoding a fusion polypeptide; (b) isolating at least two transformed organisms exhibiting a different phenotype as compared to a control organism cultivated under the same conditions; (c) isolating at least the 5′ open reading frame and the 3′ frame from each of the at least two fusion polynucleotides present in the organisms of step (b); (d) randomly combining the isolated open reading frames of step (c) to form at least two random in-frame polynucleotides, each comprising a composite open reading frame encoding a fusion polypeptide; (e) introducing the polynucleotide(s) of step (d) into an organism to produce one or more transformed organisms; and (f) isolating at least one transformed organism from the organism(s) in step (e) exhibiting a different phenotype as compared to a control organism cultivated under the same conditions, wherein steps (c) through (f) are optionally repeated one or more times, wherein the organism is fungus, an alga, a plant or an animal.
 12. The method of claim 11, wherein each 5′ open reading frame is selected from a collection comprising a plurality of open reading frames having sequences different from each other and each 3′ open reading frame is selected from a second collection comprising a plurality of open reading frames having sequences different from each other.
 13. The method according to claim 11, wherein the at least two open reading frames are nonhomologous.
 14. The method according to claim 11, wherein the at least two open reading frames are joined via a linker sequence.
 15. The method according to claim 14, wherein the linker sequence 1 to 1,000 codons in length.
 16. The method according to claim 11, wherein the random in-frame fusion polynucleotide further comprises an expression vector sequence.
 17. The method according to claim 11, wherein the random in-frame fusion polynucleotide further comprises at least one regulatory sequence.
 18. The method according to claim 17, wherein the regulatory sequence is a promoter or a terminator.
 19. The method according to claim 11, wherein the organism is a a fungus.
 21. The method according to claim 11, wherein the composite open reading frame encoding the fusion polypeptide of step (d) is different from the fusion polypeptide present in the organisms of step (b).
 22. The method according to claim 11, wherein the at least two open reading frames isolated from the genome of the same organism are full-length.
 23. The method according to claim 19, wherein the organism is an alga.
 24. A method of producing an organism with a new or altered phenotype, comprising: (a) introducing a composition comprising a plurality of random in-frame fusion polynucleotides having sequences different from each other into an organism to produce one or more transformed organisms, each random in-frame fusion polynucleotide comprising at least two open reading frames isolated from the genome of at least two different species of organism, one designated as a 5′ open reading frame and one designated as a 3′ open reading frame, the 5′ open reading frame being joined either directly, or indirectly via at least one intervening open reading frame, to the 3′ open reading frame to form a composite open reading frame encoding a fusion polypeptide, wherein the species are, bacteria, archaea, protozoa, yeast, yeast, cyanobacteria, fungus, alga, or plant; (b) isolating at least two transformed organisms exhibiting a different phenotype as compared to a control organism cultivated under the same conditions; (c) isolating at least the 5′ open reading frame and the 3′ frame from each of at least two fusion polynucleotides present in the organisms of step (b); (d) randomly combining the isolated open reading frames of step (c) to form at least two random in-frame polynucleotides, each comprising a composite open reading frame encoding a fusion polypeptide; (e) introducing the polynucleotide(s) of step (d) into an organism to produce one or more transformed organisms; and (f) isolating at least one transformed organism from the organism(s) in step (e) exhibiting a different phenotype as compared to a control organism cultivated under the same conditions, wherein steps (c) through (f) are optionally repeated one or more times, wherein the organism is a fungus, an alga, a plant or an animal.
 25. The method of claim 24, wherein the at least two different species of step (a) are species of fungus, alga, or plant.
 26. The method of claim 25, wherein the at least two different species of step (a) are both fungus.
 27. The method of claim 25, wherein the at least two different species of step (a) are both alga.
 28. The method of claim 25, wherein the at least two different species of step (a) are both plant. 